|
"Carved Ellipsoid" by Hassan Sedaghat. |
I always write my commercially available programs, such as DPGraph 2000 (including the install/uninstall programs), entirely in assembly language. This page contains bits of information that I've found to be useful.
Up - Free tools. |
Up - Introduction. |
Everything you need to program in assembly language for DOS or Windows (and probably for UNIX, Linux, etc.) is available for free on the Internet. In this chapter I list how to obtain the newest versions (and sometimes older versions) of the tools that I actually use. The list is heavily slanted toward tools for writing true 32-bit programs for Windows 95/98/NT4 or later, although most of the tools can still create 16-bit programs for DOS or Windows 3.1.
Sometimes a program that works with an older version of a tool stops working
with a newer version. To avoid that problem, I keep different versions of the
tools I use in different subdirectories. These subdirectories all live in a top
level directory called tools
. Right now it looks like this:
Volume in drive C has no label. Volume Serial Number is B099-2C6C Directory of C:\tools [.] [..] [alink-1-6] [cvtres-5-00-1736-1] [h2inc-6-12a] [imagedit-3-10-031] [link-2-60-5046] [link-5-12-8078] [link-5-12-8181] [link-5-13] [masm-6-00b] [masm-6-11c] [masm-6-11d] [masm-6-13-7299] [masm-6-14-8444] [nasm-0-97] [pedump-1993] [rc-3-10] [rc-4-00-1367] [win32lib-1998-10-1] [windbg-4-1381] 21 File(s) 0 bytes 48,501,248 bytes free
I use dashes (minus signs) to separate the version information in subdirectory names instead of periods or underscores. I don't use periods because they are also used to separate file types from file names (for example, example-6-c is less confusing than example.6.c, which some systems will interpret as meaning that example.6 is written in c). I don't use underscores because most browsers underline HTML links, which sometimes hides the underscores (for example, masm_6_14_8444).
In the batch files that I use to build my programs I specifically state which version of each tool to use. Instead of a batch file that looks like this:
ml /coff /c /w /Sn /Sl132 /Sp80 /nologo /X project.asm link @link.tmp
I use something that looks like this:
\tools\masm-6-14-8444\ml /coff /c /w /Sn /Sl132 /Sp80 /nologo /X project.asm \tools\link-5-12-8181\link @link.tmp
In parentheses after the name of each tool in this chapter is the name that I use for that tool's subdirectory.
Up - Microsoft Macro Assembler (masm-6-11d). |
These instructions are nearly the same as those for downloading the Microsoft 32-bit Linker, so you may want to grab a copy of that while you're getting the assembler.
c:\98ddk
c:\98ddk
, then use another name because you might
be mixing old and new versions of files in your current c:\98ddk
directory. The order in which you extract the files doesn't matter. After you
have extracted the files you can drag the icons to the Recycle Bin because you
won't need them anymore.c:\98ddk\setup.exe
c:\tools\masm-6-11d
).
Now copy the following files from c:\98ddk\bin\win98
to
c:\tools\masm-6-11d
:ml.exe
ml.err
c:\98ddk
directory,
but if you don't need any of them, you can uninstall the whole directory.
To uninstall, go to Control Panel, go to Add/Remove Programs, select
Microsoft 98 DDK in the bottom window, then select Add/Remove.Up - Microsoft Macro Assembler (masm-6-14-8444). |
c:\tools\masm-6-14-8444
) and copy all of the
files from c:\tools\masm-6-11d
.
c:\tools\masm-6-14-8444
.c:\tools\masm-6-14-8444\ml614.exe
c:\tools\masm-6-14-8444\patch.exe
Up - Netwide Assembler (nasm-0-98). |
NASM 0.98 is the current full release of the Netwide Assembler. It is available from many sites, for many operating systems. Follow the links from the NASM home page at http://www.web-sites.co.uk/nasm/ for more details. The page http://www.web-sites.co.uk/nasm/where.html suggests convenient places to download NASM, depending on your location on the planet.
Up - Anthony's Linker (alink-1-6). |
You can download the latest version of Anthony's Linker (alink-1-6) for Windows 95/98/NT4 or later from http://members.xoom.com/aajw_/al_al.zip (42 KB). Documentation is included in the zip file.
Anthony's Programming Page at http://alink.home.dhs.org/ has information on other useful programs, including other versions of ALINK and ALINK's source code.
Up - Microsoft 16-bit Linker (link-5-60-339). |
The latest version of the Microsoft 16-bit Linker (link-5-60-339) for
creating DOS programs is available from ftp://ftp.microsoft.com/softlib/MSLFILES/LNK563.EXE
(281 KB). Create a directory for the linker (for example,
c:\tools\link-5-60-339
), download the file to the directory, and then
run LNK563.EXE
to install the linker.
Even though the version number is greater than the 32-bit linker, the 16-bit linker is older.
Up - Microsoft 32-bit Linker (link-5-12-8181). |
These instructions are nearly the same as those for downloading the Microsoft Macro Assembler, so you may want to grab a copy of that while you're getting the linker.
c:\98ddk
c:\98ddk
, then use another name because you might
be mixing old and new versions of files in your current c:\98ddk
directory. The order in which you extract the files doesn't matter. After you
have extracted the files you can drag the icons to the Recycle Bin because you
won't need them anymore.c:\98ddk\setup.exe
c:\tools\link-5-12-8181
).
Now copy the following files from c:\98ddk\bin
to
c:\tools\link-5-12-8181
:link.exe
mspdb50.dll
c:\98ddk
directory,
but if you don't need any of them, you can uninstall the whole directory.
To uninstall, go to Control Panel, go to Add/Remove Programs, select
Microsoft 98 DDK in the bottom window, then select Add/Remove.Up - Documentation. |
Up - Tips and tricks. |
Up - Divide by a constant. |
There are many ways to avoid the expense of an integer multiply instruction
when multiplying by small integer constants. For example, multiplying Eax
by 2 is as easy as:
; Eax = integer. Add Eax,Eax ; Multiply Eax by 2. ; Eax = 2*integer.
Or, multiplying Eax
by 9 is as easy as:
; Eax = integer. Lea Eax,[Eax+8*Eax] ; Multiply Eax by 9. ; Eax = 9*integer.
Unfortunately, there don't seem to be any similar tricks for avoiding the
expense of an integer divide, unless the constant you are dividing by is a power
of 2. For example, to divide an unsigned integer in Edx
by 4, you
could use:
; Edx = unsigned integer. Shr Edx,2 ; Divide Edx by 4. ; Edx = Edx/4, rounded toward 0.
or, if Edx
is a signed integer:
; Edx = signed integer. Sar Edx,2 ; Divide Edx by 4. ; Edx = Edx/4, rounded toward minus infinity.
The closest trick that I have seen for constants that are not powers of two
is to multiply by 2^32/(integer constant), making sure that 2^32/(integer
constant) is rounded towards infinity. For example, to divide Edx
by 10, you could use the following:
; Edx = unsigned integer. Mov Eax,(1 Shl 31)/5+1 ; 2^32/10 = 2^31/5, plus 1 for rounding. Mul Edx ; Multiply Edx by 1/10. ; Edx = Edx/10, rounded toward 0, Eax = destroyed.
On many machines the Mul
instruction isn't all that speedy, so
this trick is of limited usefulness. On the other hand, if you are using
floating point numbers then multiplying by 1/constant is usually much more
efficient than dividing by the constant.
Up - Pop with Esp as base register gotcha. |
Suppose you want to copy a DWord
from somewhere on the stack to
the bottom of the stack. If the DWord
is at offset 16 (for
example) from Esp
, then the simplest way to copy it to the bottom
of the stack is:
; DWord to copy is at [Esp+16]. Push [Esp+16] ; Push DWord at [Esp+16]. ; New copy is at [Esp], original DWord is now at [Esp+20].
Now suppose our program modifies the copied DWord
at
[Esp]
, and we want to copy it back to the original DWord
,
which is now at [Esp+20]
. The naive approach would be to try
something like:
; Modified DWord is at [Esp], original DWord is at [Esp+20]. Pop [Esp+20] ; Pop DWord from [Esp] to [Esp+20] ; Modified DWord is now at [Esp+16].
Unfortunately, this would be wrong. Unlike any other instruction, Pop
calculates the address of the operand as though the instruction has
already executed. That means that Esp
will be 4 more than
expected. The correct code sequence is:
; Modified DWord is at [Esp], original DWord is at [Esp+20]. Pop [Esp+16] ; Pop DWord from [Esp] to [Esp+20] ; Modified DWord is now at [Esp+16].
Up - Test for unsigned integer power of two. |
You can use a classic assembly language trick to test if a unsigned integer is a power of two. In other words, this trick enables you to test an integer to see if one, and only one, bit is set to 1 (and thus the other bits are all set to 0's).
For example, the integers 00100b and 10000b are unsigned integer powers of two because only one bit in each is set to 1, but the integer 11100b is not an unsigned integer power of two because more than one bit is set to 1. The integer 00000b is not an unsigned integer power of two because no bits are set to 1.
The trick relies on a combination of arithmetic and logical operations, and the peculiarities of two's complement arithmetic. The basic idea is that if you bitwise-and an integer with its own negative, and if the result equals the original integer, then one, and only one, bit in the original integer was set to 1, or else the original integer was 0. For example, 00100b gives (-00100b) And 00100b = 11100b And 00100b = 00100b, so 00100b is an unsigned integer power of 2. For another example, 11100b gives (-11100b) And 11100b = 00100b And 11100b = 00100b, so 11100b is not an unsigned integer power of 2. The result of bitwise-anding an integer with its own negative gives the lowest order 1 bit in the integer, or 0 if the integer was 0.
The following code snippet uses the Eax
and Ebx
registers, but any two integer registers excluding Esp
will
work. To sanitize the source code for inclusion in this HTML file, I changed
all "<" characters to "<", and all ">" characters to ">".
If you use this code in your program, you might want to reverse those
transformations.
; Eax contains integer to test, Ebx = scratch register. Xor Ebx,Ebx ; Eax = integer, Ebx = 0. Sub Ebx,Eax ; Eax = integer, Ebx = -integer. Jz Zero ; If -integer = 0, integer = 0. And Ebx,Eax ; Eax = integer, Ebx = (-integer) And integer. Cmp Ebx,Eax ; Eax = integer, Ebx = (-integer) And integer. Jne NotPowerOfTwo ; If integer<>(-integer) And integer, multiple 1 bits. ... ; If integer=(-integer) And integer, only one 1 bit. ; Eax contains integer to test, Ebx = lowest order 1 bit in Eax, else 0.
If you don't need the lowest order 1 bit at the end of the sequence, then you can use a shorter alternative. It is based on the fact that if you bitwise-and an integer with one less than the integer, then the result is zero only if the integer was a power of two, or if the integer was zero.
; Eax contains integer to test, Ebx = scratch register. Lea Ebx,[Eax-1] ; Eax = integer, Ebx = integer-1. And Ebx,Eax ; Eax = integer, Ebx = (integer-1) And integer. Jnz NotPowerOfTwo ; If result is non-zero, integer is not 0 or power of 2. Test Eax,Eax ; Was integer 0? Jz Zero ; Yep. ... ; Integer was a power of 2 (i.e. one and only one 1 bit was set). ; Eax contains integer to test, Ebx = integer minus lowest order 1 bit, else 0.
Up - Double precision unsigned integer division. |
Here's a tricky algorithm I use when I need to find an exact double precision unsigned integer quotient and remainder. The place I seem to use it the most is in 3D edge clipping.
To sanitize the source code for inclusion in this HTML file, I changed all "<" characters to "<", and all ">" characters to ">". If you use this code in your program, you might want to reverse those transformations.
One simple way you might want to customize this code is to check at the beginning if both operands are single precision (right now it only checks to see if the divisor is single precision), and in that case performing only one single precision divide. I didn't do it when I wrote this code because the dividends I was dealing with at the time were almost never single precision.
; Divides the double precision quantity a1*b+a2 by c1*b+c2, where b is the base ; (in this case 2^32-1), and a1, a2, c1, and c2 are all in the range 0 to b-1, ; inclusive. If both c1 and c2 are zero, then a divide exception occurs. In ; all other cases, a double precision quotient and remainder are returned. ; ; By David Parker, from www.davidparker.com/assembly.html ; ; Expects on entry: Eax=c2 ; Ebx=a2 ; Ecx=a1 ; Edx=c1 ; ; Returns on exit: Eax=lo dword of quotient. ; Ebx=lo dword of remainder. ; Ecx=hi dword of remainder. ; Edx=hi dword of quotient. ; ; The algorithm handles the division using 3 special cases. ; ; CASE 1. ; ; The first case is if c1=0. ; We want to find numbers q1 and r1 such that ; ; (a1*b+a2) = q1*c2 + r1, where 0 <= r1 < c2. ; ; We note that 0 <= q1 <= (b-1)*b+(b-1). The minimum occurs, for example, ; when a1=a2=0, and c2 = 1. The maximum occurs when a1=a2=(b-1) and c2=1. ; We first divide a1 by c2 to get q2 and r2 such that: ; ; a1 = q2*c2 + r2, where 0 <= r2 < c2. ; ; The bounds on q2 are 0 <= q2 <= (b-1), the minimum occurring at a2=0, ; the maximum at a1=(b-1), c2=1. ; We then divide r2*b+a2 by c2 to get q3 and r3 such that: ; ; r2*b + a2 = q3*c2 + r3, where 0<=r3<c2. ; ; The bounds on q3 are 0 <= q3 <= (b-1), the minimum occurring at r2=a2=0, ; the maximum at r2=(c2-1), a2=(b-1). ; Starting from (a1*b+a2), we then start substituting: ; ; (a1*b+a2) ; = (q2*c2+r2)*b + a2 ; = q2*c2*b + r2*b + a2 ; = q2*c2*b + q3*c2 + r3 ; = (q2*b+q3)*c2 + r3. ; ; Comparing this to the equation for q1 and r1, we see that since 0 <= r3 < c2, ; that q1=(q2*b+q3) and r1=r3. ; ; CASE 2. ; ; The second special case is if 1 <= c1 < (b-1). In this case, we want to find ; q1 and r1 such that: ; ; a1*b+a2 = q1*(c1*b+c2) + r1, where 0 <= r1 < (c1*b+c2). ; ; We note that 0 <= q1 <= (b-1), the minimum occurring if a1=a2=0, c1=c2=(b-1), ; and the maximum occurring when c1=1,c2=0,a1=a2=(b-1). ; First, we divide (a1*b+a2) by (c1+1) to get q2 and r2 such that: ; ; (a1*b+a2) = q2*(c1+1) + r2, where 0 <= r2 < (c1+1). ; ; Q2 may be a double precision quantity. ; Next, we divide (c1*b+c2) by (c1+1) to get q3 and r3 such that: ; ; (c1*b+c2) = q3*(c1+1) - r3, where 0 <= r3 < (c1+1). ; ; We note that 1 <= q3 <= b. ; The minus sign in front of r3 is important, so that later things come out ; positive. ; ; Next, we divide q2 by q3 (which, if q3 = b, means we don't have to divide), ; to get q4 and r4 such that: ; ; q2 = q4*q3 + r4, where 0 <= r4 < q3. ; ; Substituting in this last equation for q2 and q3, and rearranging a little ; gives us: ; ; (a1*b+a2) = q4*(c1*b+c2) + q4*r3 + r2 + r4*(c1+1). ; ; We will call this last remainder r5: r5 = q4*r3 + r2 + r4*(c1+1). ; We will now show that either q4=q1, or q4=q1-1. We note that since all the ; terms in r5 are positive, we have that q4 <= q1. To see how large r5 can ; get, we first note that since r4<q3, then: ; ; r5< q4*r3 + r2 + (c1*b+c2+r3)/(c1+1)*(c1*1), or ; r5< q4*r3 + r2 + (c1*b+c2+r3), or ; r5< (q4+1)*r3 + r2 + (c1*b+c2). ; ; Since r3 and r2 are both < (c1+1), we can substitute c1 for them to get ; ; r5< (q4+1)*c1 + c1 + (c1*b +c2), or ; r5< (q4+2)*c1 + (c1*b+c2). ******************************* ; ; Now, since q4 <= q1 and q1 <= (b-1), we have that q4 <= (b-1), so that ; ; r5 < (b+1)*c1 + (c1*b+c2), or ; r5 < c1*b + c1 + c1*b + c2, or ; r5 < 2*c1*b + c1 + c2. ; ; Finally, since c1*(b-1)+2*c2 > 0, we have ; ; r5 < 2*c1*b + c1 + c2 + c1*(b-1) + 2*c2, or ; r5 < 3*c1*b + 3*c2, or ; r5 < 3*(c1*b+c2). ; ; Thus, we must have that q1-2 <= q4 <= q1. However, we can examine more ; closely the case that q1-2=q4 to show that it never occurs. If we assume ; that q4=q1-2, then we must have that r5 >= 2*(c1*b+c2). Repeating the ; equation with the long line of asterisks, we know that ; ; r5 < (q4+2)*c1 + (c1*b+c2). ; ; If we assume that q4=q1-2, then we have ; ; r5 < q1*c1 + (c1*b+c2). ; ; We know that q1 <= (b-1), so we have ; ; r5 < (b-1)*c1 + (c1*b+c2), or ; r5 < 2*c1*b - c1 + c2. ; ; Since c1+c2 >= 0, we have that ; ; r5 < 2*c1*b - c1 + c2 + c1 + c2, or ; r5 < 2*c1*b + 2*c2, or ; r5 < 2*(c1*b+c2). ; ; This contradicts our assumption that q4=q1-2, so that case can never occur. ; It is fairly easy to construct cases where q4=q1-1, so our final result for ; the case 1 <= c1 < (b-1) is that ; ; q1-1 <= q4 <= q1, and 0 <= r5 < 2*(c1*b+c2). ; ; Thus, once we have calculated q4 and r5, we can calculate q1 and r1 by ; comparing r5 to (c1*b+c2), and if r5 >= (c1*b+c2), we increment q4 by 1 and ; decrement r5 by (c1*b+c2). ; ; There is some trickiness involved in calculating the remainder r5. First, we ; need to show that r5 never exceeds the value that can be stored as a ; double precision quantity. We can start by determining a bound on q1 in ; terms of c1. We have: ; ; q1=(a1*b+a2-r1)/(c1*b+c2) ; ; q1 will be at a maximum when the numerator is as large as possible (so that ; a1=a2=(b-1) and r1=0), and the denominator is as small as possible (so that ; c2=0). Thus the maximal value of q1 for a given value of c1 is ; ; q1=((b-1)*b+(b-1))/(c1*b) ; ; Knowing that the maximal values for r2, r3, and r4 are c1, c1, and (b-1), ; respectively, and that q4<=q1, we can plug these into the expression for r5 ; to get: ; ; r5<=((b-1)*b+(b-1))/(c1*b)*c1 + c1 + (b-1)*(c1+1), or ; r5<=(c1+2)*b-1-1/b ; ; Since the maximum possible value for c1 is (b-2), we then have ; ; r5<=b^2-1-1/b ; ; Since 1/b is positive, we can add it to the right so that ; ; r5<b^2-1 ; ; Noting that the maximum double precision value is (b-1)*b+(b-1)=b^2-1, we ; see that calculating r5 will never overflow double precision. ; ; We can make the calculation of r5 slightly more efficient by noting that the ; term q4*r3 never overflows single precision. We have to start: ; ; q4*r3<=q1*r3 ; ; Using our above limit on q1 in terms of c1, we have ; ; q4*r3<=((b-1)*b+(b-1))/(c1*b)*r3 ; ; Using the fact that 0<=r3<(c1+1), we can substitute c1 for r3 to get: ; ; q4*r3<=((b-1)*b+(b-1))/(c1*b)*c1, or ; q4*r3<=((b-1)*b+(b-1))/b, or ; q4*r3<=b-1/b. ; ; Since 1/b is positive, we have ; ; q4*r3<b, or ; q4*r3<=(b-1) ; ; Since b-1 is the maximum single precision integer, we see that q4*r3 never ; overflows single precision. ; ; CASE 3. ; ; The third case is if c1=(b-1). ; Then, since 0<=a1<=(b-1), we have one of the two possibilities: ; ; If a1=(b-1) and a2>=c2, then q1=1 and r1=a2-c2, otherwise ; ; q1=0, and r1=a1*b+a2. DivDDDD_EcxEbxDDEdxEaxDR_EdxEaxQUEcxEbxRM: Test Edx,Edx ; Is c1 zero? Jnz DivDDDD_Case2 ; Nope, go try something else. Xchg Eax,Ecx ; Yep, Eax=a1,Ebx=a2,Ecx=c2,Edx=0. Div Ecx ; Eax=q2,Ebx=a2,Ecx=c2,Edx=r2. Xchg Eax,Ebx ; Eax=a2,Ebx=q2,Ecx=c2,Edx=r2. Div Ecx ; Eax=q3,Ebx=q2,Ecx=c2,Edx=r3. Xchg Ebx,Edx ; Eax=q3,Ebx=r3,Ecx=c2,Edx=q2. Xor Ecx,Ecx ; Eax=q3,Ebx=r3,Ecx=0,Edx=q2. Retn ; Return to caller. DivDDDD_Case2: Inc Edx ; Eax=c2,Ebx=a2,Ecx=a1,Edx=c1+1. Jz DivDDDD_Case3 ; If c1=65535 (c1+1=65536=0), go to case 3. Push Esi ; Stack Esi Mov Esi,Eax ; Eax=c2,Ebx=a2,Ecx=a1,Edx=c1+1,Esi=c2. Push Edi ; Stack Edi. Mov Edi,Edx ; Eax=c2,Ebx=a2,Ecx=a1,Edx=c1+1,Esi=c2,Edi=c1+1. Push Ebp ; Stack Ebp. Xor Edx,Edx ; Eax=a1,Ebx=a2,Ecx=c2,Edx=0,Esi=c2,Edi=c1+1. Xchg Eax,Ecx ; Eax=a1,Ebx=a2,Ecx=c2,Edx=c1+1,Esi=c2,Edi=c1+1. Div Edi ; Eax=Q2,Ebx=a2,Ecx=c2,Edx=Ecx,Esi=c2,Edi=c1+1. Xchg Eax,Ebx ; Eax=a2,Ebx=Q2,Ecx=c2,Edx=Ecx,Esi=c2,Edi=c1+1. Div Edi ; Eax=q2,Ebx=Q2,Ecx=c2,Edx=r2,Esi=c2,Edi=c1+1. Xchg Eax,Ecx ; Eax=c2,Ebx=Q2,Ecx=q2,Edx=r2,Esi=c2,Edi=c1+1. Mov Ebp,Edx ; Eax=c2,Ebx=Q2,Ecx=q2,Edx=r2,Esi=c2,Edi=c1+1,Ebp=r2. Mov Edx,Edi ; Eax=c2,Ebx=Q2,Ecx=q2,Edx=c1+1,Esi=c2,Edi=c1+1,Ebp=r2. Dec Edx ; Eax=c2,Ebx=Q2,Ecx=q2,Edx=c1,Esi=c2,Edi=c1+1,Ebp=r2. Div Edi ; Eax=tq3,Ebx=Q2,Ecx=q2,Edx=tr3,Esi=c2,Edi=c1+1,Ebp=r2. Xchg Ebx,Edx ; Eax=tq3,Ebx=tr3,Ecx=q2,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Xchg Ecx,Eax ; Eax=q2,Ebx=tr3,Ecx=tq3,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Test Ebx,Ebx ; Is r3 zero? Jz DivDDDD_DoDiv ; Yep, no need to invert r3 or increment q3. Sub Ebx,Edi ; Eax=q2,Ebx=-r3,Ecx=tq3,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Neg Ebx ; Eax=q2,Ebx=r3,Ecx=tq3,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Inc Ecx ; Eax=q2,Ebx=r3,Ecx=q3,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Jz DivDDDD_Rem ; If q3=65536, no need to do next division. DivDDDD_DoDiv: Div Ecx ; Eax=q4,Ebx=r3,Ecx=q3,Edx=r4,Esi=c2,Edi=c1+1,Ebp=r2. Xchg Eax,Edx ; Eax=r4,Ebx=r3,Ecx=q3,Edx=q4,Esi=c2,Edi=c1+1,Ebp=r2. DivDDDD_Rem: Xchg Eax,Ebx ; Eax=r3,Ebx=r4,Ecx=q3,Edx=q4,Esi=c2,Edi=c1+1,Ebp=r2. Mov Ecx,Edx ; Eax=r3,Ebx=r4,Ecx=q4,Edx=q4,Esi=c2,Edi=c1+1,Ebp=r2. Mul Edx ; Eax=r3*q4,Ebx=r4,Ecx=q4,Edx=0,Esi=c2,Edi=c1+1,Ebp=r2. Add Eax,Ebp ; Eax=r3*q4+r2,Ebx=r4,Ecx=q4,Edx=0,Esi=c2,Edi=c1+1. Adc Edx,Edx ; Eax=r3*q4+r2,Ebx=r4,Ecx=q4,Edx=Edx*Q4+Ecx,Esi=c2, ; Edi=c1+1. Mov Ebp,Ecx ; Eax=r3*q4+r2,Ebx=r4,Ecx=q4,Edx=Edx*Q4+Ecx,Esi=c2, ; Edi=c1+1,Ebp=q4. Mov Ecx,Edx ; Eax=r3*q4+r2,Ebx=r4,Ecx=Edx*Q4+Ecx,Edx=Edx*Q4+Ecx, ; Esi=c2,Edi=c1+1,Ebp=q4. Xchg Eax,Ebx ; Eax=r4,Ebx=r3*q4+r2,Ecx=Edx*Q4+Ecx,Edx=Edx*Q4+Ecx, ; Esi=c2,Edi=c1+1,Ebp=q4. Mul Edi ; Eax=r4*(c1+1),Ebx=r3*q4+r2,Ecx=Edx*Q4+Ecx, Xchg Eax,Ebp ; Eax=q4,Ebx=r5,Ecx=Edi,Edx=Esi*(C1+1),Esi=c2,Edi=c1+1. ; Edx=Esi*(C1+1),Esi=c2,Edi=c1+1,Ebp=q4. Dec Edi ; Eax=q4,Ebx=r5,Ecx=Edi,Edx=0,Esi=c2,Edi=c1. Add Ebx,Ebp ; Eax=r4*(c1+1),Ebx=r5,Ecx=Edx*Q4+Ecx,Edx=Esi*(C1+1), ; Esi=c2,Edi=c1+1,Ebp=q4. Adc Ecx,Edx ; Eax=r4*(c1+1),Ebx=r5,Ecx=Edi,Edx=Esi*(C1+1),Esi=c2, ; Edi=c1+1,Ebp=q4. Xor Edx,Edx ; Eax=q4,Ebx=r5,Ecx=Edi,Edx=0,Esi=c2,Edi=c1+1. Pop Ebp ; Unstack Ebp. Cmp Ecx,Edi ; Compare Edi to c1. Ja DivDDDD_DecRe ; If Edi>c1, then we need to decrease remainder. Jb DivDDDD_RemOk ; If Edi<c1, then the remainder is okay. Cmp Ebx,Esi ; Compare r5 to c2. Jb DivDDDD_RemOk ; If r5<c2, then the remainder is okay. DivDDDD_DecRe: Inc Eax ; Eax=q4+1,Ebx=r5,Ecx=Edi,Edx=0,Esi=c2,Edi=c1. Sub Ebx,Esi ; Eax=q4+1,Ebx=r5-c2,Ecx=Edi,Edx=0,Edi=c1. Sbb Ecx,Edi ; Eax=q4+1,Ebx=r5-c2,Ecx=Edi-c1,Edx=0. DivDDDD_RemOk: Pop Edi ; Unstack Edi. Pop Esi ; Unstack Esi. Retn ; Return to caller. DivDDDD_Case3: Inc Ecx ; Eax=c2,Ebx=a2,Ecx=a1+1,Edx=0. Jnz DivDDDD_Q02 ; If a1<65535, then quotient is 0. Sub Ebx,Eax ; Eax=c2,Ebx=a2-c2,Ecx=0,Edx=0. Jb DivDDDD_Q01 ; If a2<c2, then quotient is 0. Mov Eax,1 ; Eax=1,Ebx=a2-c2,Ecx=0,Edx=0. Retn ; Return to caller. DivDDDD_Q01: Add Ebx,Eax ; Eax=c2,Ebx=a2,Ecx=a1+1,Edx=0. DivDDDD_Q02: Dec Ecx ; Eax=c2,Ebx=a2,Ecx=a1,Edx=0. Mov Eax,Edx ; Eax=0,Ebx=a2,Ecx=a1,Edx=0. Retn ; Return to caller.
Up - Saturated unsigned integer addition. |
When adding two unsigned integers, if the answer overflows we sometimes want to reset the result back to a maximum value. This is called saturated addition, and it is often used in graphics operations. For example, if we are adding two color intensities, each of which ranges from 0 to 255, and the answer exceeds 255, then we usually want to reset the sum back to 255.
Many of the newer Intel-compatible CPU's support saturated addition in hardware via MMX instructions. However, detecting and invoking the MMX instructions is often more of a hassle than it's worth, and the MMX instructions aren't even supported on many older CPU's, such as Intel's original Pentium.
Fortunately, saturated addition is fairly efficient to implement using standard instructions if the saturation value is 255 in the case of byte operations, 65535 in the case of word operations, or 2^32-1 in the case of dword operations. Here is the code for the byte case; the code is analogous for the word and dword cases:
; Al contains sum, Bl contains value to add. Add Al,Bl ; Increment sum. Sbb Bl,Bl ; Bl = 0 if no overflow (no carry), or 255 if overflow. Or Al,Bl ; Al = sum if no overflow (no carry), or 255 if overflow. ; Al contains sum or 255, Bl contains 0 or 255.
Up - Absolute value of an integer. |
One of the classic assembly language tricks is taking the absolute value of a 2's complement integer. The naive method would be:
; Eax contains integer. Test Eax,Eax ; Is Eax negative? Jns GotAbsoluteValue ; If sign bit is 0, integer is already non-negative. Neg Eax ; Negate a negative integer to get a positive integer. GotAbsoluteValue: ; Eax contains abs(integer).
This method is good for clarity and for using the minimum number of registers, but is bad for speed because of the test and jump. The classic trick is:
; Eax contains integer. Cdq ; Edx = -1 if Eax negative, Edx = 0 otherwise. Xor Eax,Edx ; Eax = -integer-1 if Eax negative, Eax = integer otherwise. Sub Eax,Edx ; Eax = -integer if Eax negative, Eax = integer otherwise. ; Eax contains abs(integer), Edx = -1 if Eax was negative, Edx = 0 otherwise.
This method is bad for clarity (although most experienced assembly
language programmers would recognize it as a common idiom) and for register
usage, but is good for speed because it gets rid of the test and jump. Since
the Cdq
instruction works only with the Eax
and
Edx
registers, a variation that will work with any pair of registers
is:
; Eax contains integer. Mov Edx,Eax ; Copy integer. Sar Edx,31 ; Edx = -1 if Eax negative, Edx = 0 otherwise. Xor Eax,Edx ; Eax = -integer-1 if Eax negative, Eax = integer otherwise. Sub Eax,Edx ; Eax = -integer if Eax negative, Eax = integer otherwise. ; Eax contains abs(integer), Edx = -1 if Eax was negative, Edx = 0 otherwise.
And as a final thought, sometimes the full absolute value isn't needed. See, for example, Divide a signed dividend by an unsigned divisor. For that tip/trick, if the integer is positive or zero we want to leave the integer unchanged (just as for the absolute value). However, if the integer is negative what we want is the absolute value minus 1. So, we need only the first three instructions above:
; Eax contains integer. Mov Edx,Eax ; Copy integer. Sar Edx,31 ; Edx = -1 if Eax negative, Edx = 0 otherwise. Xor Eax,Edx ; Eax = -integer-1 if Eax negative, Eax = integer otherwise. ; Eax contains result, Edx = -1 if Eax was negative, Edx = 0 otherwise.
Up - Convert from binary to ASCII hex. |
Here's a clever trick that I read about in the Assembly Programming Journal,
February/March 1999. Suppose you have a binary number from 0h to 0Fh
in Al
, and you want to convert it to an ASCII character from "0" to
"9" or "A" to "F". The following sequence of four instructions (six bytes) will
do it:
; Al is from 0h to 0Fh. Add Al,90h ; Al is from 90h to 09Fh. Daa ; Al is from 90h to 99h (carry clear) or 00h to 06h (carry set). Adc Al,40h ; Al is from 0D0h to 0D9h or 41h to 46h. Daa ; Al is from 30h to 39h ("0" to "9") or 41h to 46h ("A" to "F").
Up - Divide a signed dividend by an unsigned divisor. |
Suppose you want to divide a signed dividend (numerator) by a non-zero unsigned divisor (denominator), to get a signed quotient and a non-negative remainder. In other words, given Num and Den where
-2^31 <= Num <= 2^31-1 and 0 < Den <= 2^32-1
you would like to calculate Quo and Rem such that
Num = Quo*Den + Rem and 0 <= Rem < Den.
Unfortunately, the Idiv
instruction can't do this kind of
division, because if Num is negative, Idiv
will return a negative
or zero remainder. Also, the Idiv
instruction would require
narrowing the range of Den to 0 < Den <= 2^31-1.
Here is the algorithm I use. The basic idea is to convert Num to a positive
value if it is negative, perform unsigned division to obtain an intermediate
Quo' and Rem', and then correct Quo' and Rem' if Num was originally negative.
The algorithm involves no jump instructions, and so won't break the pipeline.
It requires two extra registers in addition to Eax
and
Edx
. The two extra registers can be any two of Ebx
,
Ecx
, Esi
, Edi
, and Ebp
. In the
following code snippet, I use Esi
and Edi
.
; Eax = Num; Esi = Den. Mov Edi,Eax ; Copy Num. Sar Edi,31 ; If Num < 0 then Edi = -1 else Edi = 0. Xor Edx,Edx ; Zero hi dword of dividend. Xor Eax,Edi ; If Num < 0 then Eax = -Num-1 else Eax = Num. Div Esi ; If Num < 0 then Quo'*Den + Rem' = -Num-1 ; else Quo'*Den + Rem' = Num. Xor Eax,Edi ; If Num < 0 then Eax=Quo=-Quo'-1 else Eax=Quo=Quo'. Xor Edx,Edi ; If Num < 0 then Edx = -Rem'-1 else Edx = Rem'. And Edi,Esi ; If Num < 0 then Edi = Den else Edi = 0. Add Edx,Edi ; If Num < 0 then Edx=Rem=Den-Rem'-1 else Edx=Rem=Rem'. ; Eax = Quo; Edx = Rem; Esi = Den; Edi = Den if Num < 0 else Edi = 0.
You can save a few cycles if you know that -2^15 <= Num <= 2^15-1 and 0
< Den <= 2^16-1 (i.e. that Num is a signed 16-bit integer and Den is a
non-zero unsigned 16-bit integer). If so, change Div Esi
to
Div Si
to use faster 16-bit division, and leave the other instructions
unchanged.
To show that the algorithm works as advertised, first note that in the case
of a non-negative Num the algorithm reduces to simple unsigned division so there
is nothing to prove. In the case of a negative Num, we first note that -Num-1
is non-negative if Num is negative. Hence, the result of the Div
instruction is a pair of non-negative intermediate numbers Quo' and Rem'
which satisfy
-Num-1 = Quo'*Den + Rem' and 0 <= Rem' < Den.
We then calculate the true Quo and Rem using
Quo = -Quo'-1 and Rem = Den-Rem'-1.
Solving for Quo' and Rem' in terms of Quo and Rem, and plugging into the equation for -Num-1 gives
-Num-1 = (-Quo-1)*Den + (Den-Rem-1)
which simplifies to
Num = Quo*Den + Rem
Since 0 <= Rem' < Den, then 0 <= Den-Rem'-1 < Den, and so 0 <= Rem < Den, which shows that the algorithm works as advertised.
Up - Use the parity flag to test three values with one instruction. |
One bit can store two values: 0b or 1b. Two bits can store four values: 00b, 01b, 10b, or 11b. Sometimes, however, you only need to store three values. If so, you can take advantage of the parity flag to check for the three values using a single instruction. If you let the three values be 00b, 11b, and 01b (or 10b), then you can use the Jpo (jump on parity odd) instruction to do something like:
Test Al,11b ; Test bottom two bits of Al. Jpo Value01bOr10b ; 01b or 10b in bottom two bits. Jnz Value11b ; 11b in bottom two bits. ... ; Fall through to here if 00b in bottom two bits.
Note that the order of the jump instructions is important, and also note the
Parity flag gotcha below -
both of the bits you are testing must be in the same byte. Putting the value
into the Al
register saves a byte of code, because testing
the Al
register takes only two bytes of code, while testing any
other eight-bit register takes three bytes of code. You could have used a
Test Eax,03h
instruction, but that takes five bytes of code, and testing
any other 32-bit register takes six bytes of code.
Up - Parity flag gotcha. |
Instructions that alter the parity flag clear it to 0 if there are an odd
number of bits in the lowest order byte, and set it to 1 if there are an even
number of bits in the lowest order byte. That may seem backwards, but the flag
is technically a "parity even" flag so it gets set if there are an even number
of bits in the lowest order byte. The gotcha, however, is in the words "lowest
order byte". Suppose the Eax
register contains the value
0101h
(i.e. 1 in the low order byte, and 1 in the second lowest order
byte). If you test the whole Eax
register using the Test
Eax,Eax
instruction, you might expect that the parity flag would get set
to 1 because there are an even number of bits in the Eax
register.
However, the parity flag gets cleared to 0 because the parity flag depends only
on the number of bits in the lowest order byte. This is always true for any
instruction that alters the parity flag; only the bits in the lowest order byte
affect the parity flag. So, when using the parity flag for a test, it is often
best to use eight-bit registers or single byte memory operands.
Up - Copyright © 1999 by David Parker. All rights reserved. |