PLEASE read Chapter 1 for legal terms and conditions, how to register for the package, and an overview of the assembler.
The A86 package consists of the program A86.COM, a collection of source, batch, and library files used by the demonstration contained in Chapter 2, and this manual A86MANU.TXT.
1 INTRODUCTION AND LEGAL TERMS
5
Introduction 5
Legal Terms and Conditions 5
Registration Benefits 6
Overview of A86 7
About the Author 8
How to Contact Me 8
2 A86 DEMONSTRATION
9
Assembling a Very Short Program: PAGE.COM 9
Demonstration of Error Reporting 9
Assembling a Longer Program with Library Files: REV.COM
9
Assembling a medium-sized program: TCOLS.COM 9
3 OPERATION AND REQUIREMENTS
11
Creating Programs to Assemble 11
Program Invocation 11
Assembler Switches 12
The A86 Environment Variable 15
Using Standard Input as a Command Tail 15
Strategies for Source File Maintenance 15
System Requirements for A86 16
4 ELEMENTS OF THE A86 LANGUAGE
17
General Categories of A86 Elements 17
Operand Typing and Code Generation 18
Registers 18
Variables 18
Labels 18
Constants 19
Generating Opcodes from General Purpose Mnemonics 19
5 SOME EXCLUSIVE FEATURES OF
A86 21
The IF Statement 21
Multiple operands to PUSH, POP, INC, DEC 21
Repeat Counts to String Instructions 21
Conditional Return Instructions 21
A86 extensions to the MOV and XCHG instructions 22
Local Symbols 22
Operands to AAM and AAD Instructions 23
Single-Operand Forms of the TEST Instruction 23
(pg 2)
Optimized LEA Instruction 23
6 THE 86 INSTRUCTION SET
24
Effective Addresses 24
Segmentation and Effective Addresses 25
Effective Use of Effective Addresses 26
Encoding of Effective Addresses 26
How to Read the Instruction Set Chart 28
The 86 Instruction Set 29
7 FLOATING-POINT INSTRUCTIONS
37
87-Family Coprocessors 37
Extra Coprocessor Support 37
Emulating the 8087 by Software 38
The Floating Point Stack 38
Floating Point Initializations 38
Built-In Constant Names 39
Special Immediate FLD Form 39
Floating Point Operand Types 39
Operand Choices in A86 39
The 87 Instruction Set 40
8 NUMBERS AND EXPRESSIONS
44
Numbers and Bases 44
The RADIX Directive 44
Floating Point Initializations 45
Overview of Expressions 45
Types of Expression Operands 45
Descriptions of Operators and Specifiers 45
HIGH operand LOW operand 46
operand BY operand 46
Addition (combination): operand + operand operand.operand
46
Subtraction: operand - operand 46
Multiplication and Division: operand * operand operand / operand
46
Shifting: operand SHR operand operand SHL operand BIT operand
46
Logical: operand OR operand operand XOR operand 47
Boolean negation: ! operand 47
Relational: operand EQ operand operand LT operand operand LE operand
47
String Comparison: string EQ string string NE string string = string
47
Memory Variable Specifiers: memletter operand operand memletter
47
SHORT label LONG label 48
OFFSET var-name 48
NEAR operand 48
Square Brackets: [operand] 48
Colon Operator: const:operand segreg:operand segname:operand
49
ST operand operand ST 49
REF symbol DEF symbol 49
TYPE operand 49
THIS and $ Specifiers 49
ELSE default_op new_op ELSE default_op 50
Operator Precedence 50 (pg 3)
9 DIRECTIVES IN A86
51
Segments in A86 51
CODE ENDS DATA ENDS 51
ORG address 51
EVEN constant 52
Data Allocation Using DB, DW, DD, DQ, DT, and DP 52
The STRUC Directive 54
Forward References 54
Forward References in Expressions 54
The EQU Directive: symbol_name EQU expression 55
Equates to Built-In Symbols 55
The NIL Prefix 55
Interrupt Equates 55
Duplicate Definitions 55
The = Directive: symbol_name = expression 56
name PROC NEAR name PROC FAR name PROC [name] ENDP 56
name LABEL memletter 57
USE16 USE32 FLAT 57
INCLUDE file_name 57
10 RELOCATION AND LINKAGE
58
.OBJ Production Made Easy 58
Overview of Relocation and Linkage 59
NAME module_name 60
PUBLIC sym1, sym2, sym3, ... 61
EXTRN sym1:type, sym2:type, sym3:type, ... 61
MAIN: The Starting Location for a Program 62
END start_addr 62
seg_name SEGMENT [align] [combine] [use] ['class_name']
62
DATA SEGMENT, STRUC and CODE SEGMENT Directives 64
[seg_name] ENDS 64
Default Outer SEGMENT 65
group_name GROUP seg_name1,seg_name2,... 65
SEG operand 65
11 MACROS AND CONDITIONAL
ASSEMBLY 67
Macro Facility 67
Simple Macro Syntax 67
Formatting in Macro Definitions and Calls 67
Macro Operand Substitution 67
Quoted String Operands 68
Looping by Operands in Macros 69
The #L Last Operator and Indefinite Repeats 69
Character Loops 69
The "B"-Before and "A"-After Operators 70
Multiple Increments within Loops 70
Negative R-loops 70
Nesting of Loops in Macros 71
Implied Closing of Loops 71
Passing Operands by Value 71
Passing Operand Size 71
Generating the Number of an Operand 71
Parenthesized Operand Numbers 72 (pg 4)
Exiting from the Middle of a Macro 72
Local Labels in Macros 72
Debugging Macro Expansions 73
Conditional Assembly 73
Conditional Assembly and Macros 74
Simulating MASM's Conditional Assembly Constructs 75
Declaring Variables in the Assembler Invocation 75
Null Invocation Variable Names 75
Changing Values of Invocation Variables 76
12 COMPATIBILITY WITH
OTHER ASSEMBLERS 77
Conversion of MASM programs to A86 77
Compatibility symbols recognized by A86 79
Conversion of A86 Programs to Intel/IBM/MASM 79
13 ASSOCIATED TOOLS AND
OUTPUT FILES 81
Listings with A86 81
Listing Control Directives 81
Cross-reference Facility 82
A86LIB Source File Library Tool 83
Using A86.LIB in A86 Assemblies 84
Environment Variable A86LIB 84
Forcing a Library Search 84
14 DESCRIPTIONS OF A86
ERROR MESSAGES 86
15 RELEASE HISTORY OF A86
93
16 A86 RESERVED SYMBOLS
94
INDEX
(pg 5)
A86 is the finest assembler available, at any cost under any terms, for the 86-family of microprocessors (IBM-PC, compatibles, and not-so-compatibles). In contrast to software firms who attempt to restrict the distribution of their products via protection-schemes, I encourage free distribution, and trust that those who use my products will pay for them.
Please keep in mind the fundamental good spirit of free-distribution software as you endure the following barrage of legalities. Then evaluate the outstanding value that the A86 package offers you. I assure you that you will not be disappointed.
This package is provided to you under the following conditions:
Thank you for enduring the legalities. They are there to protect me, and also to convince you that this is my business, from which I make my living. I'll now return to a softer sell, to try to make you want to register for my products.
There is a certain amount of ambiguity about when you're still evaluating A86, and when you're really using A86 and should register for it. Some cases are clear (e.g., you're a school using A86 to teach a course); but many are not. In practical reality, it's up to you to decide: you are "on your honor". Also in practical reality, most users who ought to register haven't, yet. For most, it's not dishonesty but merely procrastination. So I have provided some incentives, to prod you into registering.
One incentive is the printed manual, which only registered users can purchase. I haven't left anything out of the disk version of the manual, but the printed version is formatted and bound much more nicely than if you print it yourself.
Another incentive, included if you register both A86 and D86, is the A386/D386 package, for 32-bit programming. At this writing, A386 covers all instructions though the Pentium II, including MMX, AMD 3DNow, and the not-yet-released KNI instruction set. A386 also allows coding for USE32 and FLAT mode segments.
Another incentive is the tool A86LIB.COM, that lets you create libraries of source files, to be automatically searched by A86 whenever your program has undefined symbols. This means you can effectively add procedures of arbitrary power and complexity to A86's language.
I am now offering my personal library of source files, for sale to registered A86 users, for an additional $50 when purchased with a registration or update; $60 when purchased later. The library includes over 100 library source files, plus the source code to over 100 working small programs that use the library. The package consists of over 800K of source code, comprising over 24000 lines, all written by me. The code concerns itself with file I/O, numeric and string I/O, command-line parsing including wildcards, buffering, string searching and manipulation, sorting, and random-number generation. This code is well-commented, but is provided on an "as-is" basis -- I will not answer any questions about how the code works. I have hesitated to release this package because I was afraid of a huge support burden; hence, I am disclaiming any support!(pg 7) Finally, there are the intangible incentives. You know you've done the right thing. You're letting me know that you appreciate what I've done. You're letting the world know that quality software can succeed when distributed as shareware.
A86 accepts assembly language source files, and transforms them directly into either: (1) .COM files executable under MS-DOS, starting at offset 0100 within a code segment; (2) .OBJ files suitable for feeding to a linker; or (3) object files starting at offset 0, suitable for copying to ROMs. A86 is a full featured, professional-quality program. I designed A86 to be as closely compatible to the standard Intel/IBM assembly language as possible, given that I insisted upon making design and language enhancements necessary to make A86 the best possible assembler. Some of A86's most notable features are.
I am a full-time shareware author. I have worked with Intel microprocessors since the early days of the 8080. As an employee of Intel, I was a part of the two-man team that implemented the first ASM86 assembler. Having worked with all the processors of the 86 family from the beginning, I know as much as anyone about their machine-language architecture.
A86 and D86 themselves are extremely mature, solid programs. They have been in existence since 1984, running first under my own, proprietary operating system; then later under the Xenix operating system on Altos computers, used by myself and my clients. I have been making a decent living from my products for some time now, and with your much-appreciated support, I will continue to improve my products, and enhance them with new, related offerings.
I have no plans to move from my present location at least through the millennium. So you can write to.
Eric Isaacson Software 416 East University Ave. Bloomington, IN 47401-4739
or call 1-812-339-1811 voice, or 1-812-335-1611 fax.
Sorry, I can't guarantee to return everybody's long distance calls. If you'd like to be SURE I'll get back to you, please invite me to call you back collect, or tell me to charge the cost of the call to your credit card.
I also have a section on Compuserve: just type GO ZIPKEY to any ! prompt. (ZIPKEY is the name of my other product line, a pop-up zipcode directory.) My Internet address is eric@eji.com .
PLEASE contact me if you find bugs in my programs; I'll fix them! I accept bug reports from anyone, registered or non-registered, no questions asked. It's very frustrating to hear about people telling each other about bugs, and not telling me. I still await Greg Wettstein's bug list. (pg 9)
To give you a feeling for the operation of A86, I have provided some source files for you to assemble. You should make sure your current directory (or a PATH directory) is the one that contains this assembler package, and perform the following operations to see the assembler package in action.
First, let's assemble a very short program; a program that sends an ASCII form feed (hex 0C) to your line printer. The source for this program is PAGE.8; type the command TYPE PAGE.8 to see how simple this program is: note the lack of red tape directives (NAME, ASSUME, END, PUBLIC, etc.) required by other assemblers. Now type the command A86 PAGE.8 to assemble the file. Make sure you don't blink your eyes after typing the command; you'll miss the assembly, because A86 is FAST.
You now have a file PAGE.COM, which is an executable program. If you now type the command PAGE with your printer turned on, and if your printer recognizes the form feed character, then it should advance to the next page. You have just created a useful tool. By altering the DB line in the source code that contains the form feed, you can create tools to output other control sequences to your printer.
Now type the command ERDEMO, invoking the batch file ERDEMO.BAT. This will invoke an assembly of a source file PAGE.BAD (copied from PAGE.BL so you can run this demo again), into which I have deliberately placed an erroneous statement, XCHG BL,AX. Note that A86 tells you that it has inserted error messages into PAGE.BAD, and saved the original source in PAGE.OLD.
Now use your favorite text editor to edit PAGE.BAD. You can use your editor's string search function to find a tilde symbol, which precedes all A86 error messages. Without altering the messages, change the BL to BX, and exit your editor. Now type the command A86 PAGE.BAD to reassemble the file. You should get a successful assembly. Now type the command TYPE PAGE.BAD, and note that A86 has removed the error messages for you. Wasn't that easy?
Let's see A86 assemble a program with four source files. Type the command A86 REV.8 to the console. A86 will assemble the REV.8 file you specified, see that there are undefined symbols in the program, then assemble the files LINES.8, MSDOS.8, and USAGE.8, listed in the library file A86.LIB, which I created using the tool A86LIB available only to registered users.
REV is a tool that exists in the Unix operating system. It is a "filter"; that is, it reads from standard input, transforms the input, and outputs the transformed data to standard output. The transformation that REV performs is to reverse all lines, so that they come out backwards.
The usefulness of REV is in conjunction with other tools. In particular, suppose you have a list of words that you wanted sorted according to their last letters, not their first. You run the list through REV, to get the words spelled backwards. Then you run that output through SORT, to sort them that way. Finally, you run the output of SORT through REV again, to get the words spelled forwards again, but still sorted according to their backwards spellings.
The normal usage of REV is, therefore, in conjunction with redirection of standard input and output; e.g. REV < infile > outfile. If you want to just see if REV works, type REV, the Enter key, your first name, the Enter key, your last name, the Enter key, the F6 key, and the Enter key. You'll get your first and last name spelled backwards.
Type the command MTCOLS to execute the batch file MTCOLS.BAT. Observe that the file assembles the file TCOLS.8 into the program TCOLS.COM. This assembly uses the +L and +X switches to produce a listing file TCOLS.LST and a cross-reference file TCOLS.XRF. (pg 10)
Type the command TCOLS. The TCOLS program you just assembled will execute, and notice that you have given it no parameters. It thus gives you a self-documenting message. Note that towards the end of the message is an example showing how TCOLS can be used to print .XRF listings. You can do so now by turning your printer on and typing an appropriate command; e.g.,
TCOLS <TCOLS.XRF 4 6 80 66 >PRN
for 4 columns, skip 6 lines between pages, which are 80 columns by 66 lines.
If you examine the file TCOLS.LST with your favorite text editor, you'll find a complete listing of the program, including the expansion of the DEFAULT macro defined within the program. (pg 11)
Everything I say about A86 applies equally well to my A386 assembler, whose program name is A386. The additional features of A386 consist of the 32-bit register set and additional instructions and instruction forms, outlined in Chapter 6. A386 is available only on the extended-package registered disk.
Before you invoke A86 you must have an assembly-language source program to assemble. A source program is an ASCII text file, created with the text editor of your choice. The editor must produce a file that is free of internal records known only to the editor. Some of the fancier word processors will require you to use a "plain text" mode to ensure that the file is free of such records.
Once you have a source file, the A86 program is all you need to produce programs with the COM extension. If you would like to produce programs with an EXE extension, you need to obtain a LINK program. All linkers running under MS-DOS are compatible with A86. If you have any high-level- language compiler, it should come with a linker that you can use with A86 (even if you are not linking in modules from that compiler). Also, old versions of DOS (3.x and older) come with a linker called LINK.EXE.
This manual will fully explain to you the correct syntax of an A86 program, but it is not intended to teach you about the 86-family instruction set, or about assembly-language interfacing to your computer or your operating system.
The instruction set charts in Chapters 6 and 7 give concise, one-line descriptions of each instruction, but they don't go into any detail about instruction usage, or about how to make system calls to input from keyboard or disk, output to screen, printer or disk, etc. For that, you need a book that covers the MS-DOS operating system and the BIOS for the IBM-PC. I am currently using DOS Programmer's Reference by Terry Dettmann. At a more instructional level, my users report that Peter Norton's Assembly Language Guide to the IBM-PC has been helpful. There is now a book written specifically for A86: "The 8086 Microprocessor: Programming & Interfacing The PC", by Kenneth J. Ayala, West Publishing, ISDN 0-314-01242-7. If you would like to see lots of programming examples using A86, be sure to order my source-file library package, which costs $50 in addition to the A86 registration price.
To invoke A86, you must provide a program invocation line, either typed to the console when the DOS command prompt appears, or included in a batch file. The program invocation line consists of the program name A86, followed by assembler switches (described in the next section), and the names of the source files you want to assemble, and of the output files you want to produce. If the output files all have their standard extensions, they may appear in any order: before, after, or even intermixed with the source file names. If they don't have their standard extensions, you must give the source file names first, followed by the word TO, followed by the output file names. Each non-standard name following the word TO will be assigned to the first previously- unassigned output file in order: program, symbols, listing, then cross-reference.
You may use the wild card delimiters * and ? if you wish, to denote a group of source files to be assembled. A86 will sort all matching names into alphabetical order for each wild card specification; so the files will be assembled in the same order even if they get jumbled up within a directory.
If you provide a name without a period or an extension, A86 will use that as the output program file name, appending to it the default extension as follows.
If you want your program file to have no extension, you end the (pg 12) file name with a period.
You may omit any of the output file names if you wish. If you do so, A86 will output the program source.COM (or source.OBJ or source.BIN), where "source" is a name derived from the list of source files, according to the rules described in the section "Strategies for Source File Maintenance" later in this chapter. Any of the other output files will use the name of the program output file, combined with the standard extension for that output file.
In addition to input and output file names, you may intersperse assembler switch settings anywhere after the A86 program name. They are all acted upon immediately, no matter where they are on the command line. Some of the switches are discussed in more detail elsewhere; I'll summarize them here.
If n is not given after +L, a value of 39 is assumed: symbol table, hex pointer
only if there are object bytes, no listing control lines, and macro expansion
lines, conditional control lines, and skipped lines are all listed. If +L
is not given at all, then a listing file will be produced only if it is
explicitly named on the invocation line.
If -T (or +T0) is specified, then no titling or pagination is done at all.
If the switch is not specified at all, a default value of 52 (auto-paging,
auto-titling, and source file line, but no auto-section and no FF instead
of LF). Of course, if there is no listing file the T switch has no effect.
Unless otherwise stated, the default setting for all the switches is "minus". Multiple switches can be specified with a single sign; e.g. +OG15L55 is the same as +O+G15+L55.
To allow you to customize A86, the assembler examines the MS-DOS environment variable named A86 when it is invoked. If there is such a variable, its contents are inserted before the invocation command tail, as if you had typed them yourself.
For example, if you execute the command SET A86=+O while in DOS (typically in the AUTOEXEC.BAT file run when the computer is started), then the O switch will be "plus", unless overridden with a "minus" setting in the command line.
You may also include one or more file names in the A86 environment variable. Those files will always be assembled first, before the files you specify on the command line. This allows you to set up a library of macro definitions, which will always be automatically available to your programs. Thus, for example, the DOS command SET A86=C:\A86\MACDEF.8 +O will cause both the O switch to default ON, but will also cause the file MACDEF.8 of subdirectory A86 of drive C to always be assembled.
The following feature is a bit advanced. If you're not familiar with the practice of redirecting standard input, you may safely skip this section.
A86 can also be configured to take its command arguments from standard input, in addition to the invocation command tail or the A86 environment variable. This allows A86 to be used in those menu-driven systems that don't generate command tails for programs. It also allows other programs to create lists of files to be assembled, then "pipe" the list to A86.
Here's how the feature works: when the command argument A86 is an ampersand &, A86 will prompt for standard input. If the ampersand is seen but there are other things following it, the ampersand is ignored.
For example, you can place a list of file names and switch settings into a file called FILELIST. You can then invoke the assembler via
A86 <FILELIST &
which will cause the contents of the FILELIST file to be used as a command line.
You may place an ampersand at the end of your A86 environment variable. If you do so, then A86 will prompt for file names whenever it is invoked without any command arguments (you type A86 followed immediately by the Enter key to the MS-DOS prompt). This is the mode used if you have a menu system that can't generate an invocation command tail.
Note: when you redirect standard input so that it comes from a file, A86 will read all the lines of the file (up to a limit of 1023 bytes), and substitute spaces for the line breaks. Thus you may give the file names on individual lines, for readability. However, if the feature is invoked manually (no redirection), so that you are typing in the line after the prompt, A86 will take the first line only. You need to give all your switches and files on that one line.
A86 encourages modular programming, by letting you break your source into separate files, with complete impunity. A86 has no concern whatsoever for file breaks -- it treats the sequence of files as a single source code stream.
You should consider one or more of the following strategies, which I have adopted in my source file management.
A86 requires MS-DOS V2.00 or later. No BIOS or lower-level calls are made by A86, so A86 should run on any MS-DOS machine. Please let me know if you find this not to be the case.
A86 itself is a small program, and it is fairly flexible about the memory it uses. You can assemble with only 40K bytes of memory beyond the program itself, which in the current version is about 35K bytes -- a total of 75K bytes beyond the operating system. There is no longer any limit on the size of source files assembled under A86. The more memory you have, the more capacity A86 has, in symbol table size and output file size. If it can, A86 will use up to 400K bytes of memory. (pg 17)
This chapter begins the description of the A86 language. It's a bit more tutorial in nature than the rest of the manual. I'll start by describing the elementary building blocks of the language.
The statements in an A86 source file can be classified in three general categories: instruction statements, data allocation statements, and assembler directives. An instruction statement uses an easily remembered name (a mnemonic) and possibly one or more operands to specify a machine instruction to be generated. A data allocation statement reserves, and optionally initializes, memory space for program data. An assembler directive is a statement that gives special instructions to the assembler. Directives are unlike the instruction and data allocation statements in that they do not specify the actual contents of memory. Examples of the three types of A86 statements are given below. These are provided to give you a general idea of what the different kinds of statements look like.
MOV AX,BX ; instruction statement CALL SORT_PROCEDURE ; instruction statement ADD AL,7 ; instruction statement A_VARIABLE DW 0 ; data allocation statement DB 'HELLO' ; data allocation statement CODE SEGMENT ; assembler directive ITEM_COUNT EQU 5 ; assembler directive
The statements in an A86 source file are made up of reserved symbols, user symbols, numbers, strings, special characters, and comments.
Symbols are the "words" of the A86 language. All symbols are a collection of consecutive letters, numbers, and assorted special characters: _, @, $, and ?. Symbols cannot begin with digits: anything that begins with a digit is a number. Symbols can begin with any of the special characters just listed. Symbols can also begin with a period, which is the only place within the symbol name a period can appear.
Reserved symbols have a built-in meaning to the assembler: instruction mnemonics (MOV, CALL), directive names (DB, STRUC), register names, expression operators, etc. User symbols have meanings defined by the programmer: program locations, variable names, equated constants, etc. The user symbol name is considered unique up to 127 characters, but it can be of any length (up to 255 characters). Examples of user symbols are COUNT, L1, and A_BYTE.
Numbers in A86 may be expressed as decimal, hexadecimal, octal, binary, or decimal "K"". These must begin with a decimal digit and, except in the case of a decimal or hexadecimal number, must end with "x" followed by a letter identifying the base of the number. A number without an identifying base is hexadecimal if the first digit is 0; decimal if the first digit is 1 through 9. Examples of A86 numbers are: 123 (decimal), 0ABC (hexadecimal), 1776xQ (octal), 10100110xB (binary), and 32K (decimal 32 times 1024).
Strings are characters enclosed in either single or double quotes. Examples of strings are: '1st string' and "SIGN-ON MESSAGE, V1.0". If you wish to include a quote mark within a string, you can double it; for example, 'that''s nice' specifies a single quote mark within the string. The single quote and double quote are two of many special characters used in the assembly language. Others, run together in a list, are: ! $ ? ; : = , [ ] . + - ( ) * / > . The space and tab characters are also special characters, used as separators in the assembly language.
A comment is a sequence of characters used for program documentation only; it is ignored by the assembler. Comments begin with a semicolon (;) and run to the end of the line on which they are started. Examples of lines with comments are shown below.(pg 18)
; This entire line is a comment. MOV AX,BX ; This is a comment next to an instruction statement.
Alternatively, for compatibility with other assemblers, I provide the COMMENT directive. The next non-blank character after COMMENT is a delimiter to a comment that can run across many lines; all text is ignored, until a second instance of the delimiter is seen. For example,
COMMENT 'This comment runs across two lines'
I don't like COMMENT, because I think it's very dangerous. If, for example, you have two COMMENTs in your program, and you forget to close the first one, the assembler will happily ignore all source code between the comments. If that source code does not happen to contain any labels referenced elsewhere, the error may not be detected until your program blows up. For multiline comments, I urge you to simply start each line with a semicolon.
Statements in the A86 are line oriented, which means that statements may not be broken across line boundaries. A86 source lines may be entered in a free form fashion; that is, without regard to the column orientation of the symbols and special characters.
PLEASE NOTE: Because an A86 line is free formatted, there is no need for you to put the operands to your instructions in a separate column. You organize things into columns when you want to visually scan down the column; and you practically never scan operands separate from their opcodes. Realizing this, you may wish to separate your operands from the mnemonic with a space instead of a tab, making the line less disjointed and hence easier to read. You will also have room for a longer comment after the instruction.
A86 is a strongly typed assembly language. What this means is that operands to instructions (registers, variables, labels, constants) have a type attribute associated with them which tells the assembler something about them. For example, the operand 4 has type "number", which tells the assembler that it is a numerical constant, rather than a register or an address in the code or data. The following discussion explains the types associated with instruction operands and how this type information is used to generate particular machine opcodes from general purpose instruction mnemonics.
The 8086 has 8 general purpose word (two-byte) registers: AX,BX,CX,DX,SI,DI,BP, and SP. The first four of those registers are subdivided into 8 general purpose one-byte registers AH,AL,BH,BL,CH,CL,DH, and DL. There are also 4 16-bit segment registers CS,DS,ES, and SS, used for addressing memory; and the implicit instruction-pointer register (referred to as IP, although "IP" is not part of the A86 assembly language).
My A386 assembler supports the two additional segment registers FS and GS, plus the 32-bit general registers EAX,EBX,ECX,EDX,ESI,EDI,EBP, and ESP. The lower 16 bits of each 32-bit register is the corresponding 16-bit register (without the E in its name).
A variable is a unit of program data with a symbolic name, residing at a specific location in memory. A variable is given a type at the time it is defined, which indicates the number of bytes associated with its symbol. Variables defined with a DB statement are given type BYTE (one byte), and those defined with the DW statement are given type WORD (two bytes). Examples.
BYTE_VAR DB 0 ; A byte variable. WORD_VAR DW 0 ; A word variable.
(pg 19) A label is a symbol referring to a location in the program code. It is defined as an identifier, followed by a colon (:), used to represent the location of a particular instruction or data structure. Such a label may be on a line by itself or it may immediately precede an instruction statement (on the same line). In the following example, LABEL_1 and LABEL_2 are both labels for the MOV AL,BL instruction.
LABEL_1: LABEL_2: MOV AL,BL
In the A86 assembly language, labels have a type identical to that of constants. Thus, the instruction MOV BX,LABEL_2 is accepted, and the code to move the immediate constant address of LABEL2 into BX, is generated.
IMPORTANT: you must understand the distinction between a label and a variable, because you may generate a different instruction than you intended if you confuse them. For example, if you declare XXX: DW ?, the colon following the XXX means that XXX is a label; the instruction MOV SI,XXX moves the immediate constant address of XXX into the SI register. On the other hand, if you declare XXX DW ? with no colon, then XXX is a word variable; the same instruction MOV SI,XXX now does something different: it loads the run-time value of the memory word XXX into the SI register. You can override the definition of a symbol in any usage with the immediate-value operator OFFSET or the memory-variable operators B,W,D,Q, or T. Thus, MOV SI,OFFSET XXX loads the immediate value pointing to XXX no matter how XXX was declared; MOV SI,XXX W loads the word-variable at XXX no matter how XXX was declared.
A constant is a numerical value computed from an assembly-time expression. For example, 123 and 3 + 2 - 1 both represent constants. A constant differs from a variable in that it specifies a pure number, known by the assembler before the program is run, rather than a number fetched from memory when the program is running.
My A86 assembly language is modeled after Intel's ASM86 language, which uses general purpose mnemonics to represent classes of machine instructions rather than having a different mnemonic for each opcode. For example, the MOV mnemonic is used for all of the following: move byte register to byte register, load word register from memory, load byte register with constant, move word register to memory, move immediate value to word register, move immediate value to memory, etc. This feature saves you from having to distinguish "move" from "load", "move constant" from "move memory", "move byte" from "move word", etc.
Because the same general purpose mnemonic can apply to several different machine opcodes, A86 uses the type information associated with an instruction's operands in determining the particular opcode to produce. The type information associated with instruction operands is also used to discover programmer errors, such as attempting to move a word register to a byte register.
The examples that follow illustrate the use of operand types in generating machine opcodes and discovering programmer errors. In each of the examples, the MOV instruction produces a different 8086 opcode, or an error. The symbols used in the examples are assumed to be defined as follows: BVAR is a byte variable, WVAR is a word variable, and LAB is a label. As you examine these MOV instructions, notice that, in each case, the operand on the right is considered to be the source and the operand on the left is the destination. This is a general rule that applies to all two-operand instruction statements.
MOV AX,BX ; (8B) Move word register to word register. MOV AX,BL ; ERROR: Type conflict (word,byte). MOV CX,5 ; (B9) Move constant to word register. MOV BVAR,AL ; (A0) Move AL register to byte in memory. MOV AL,WVAR ; ERROR: Type conflict (byte,word). MOV LAB,5 ; ERROR: Can't use label/constant as dest. to MOV. (pg 20) MOV WVAR,SI ; (89) Move word register to word in memory. MOV BL,1024 ; ERROR: Constant is too large to fit in a byte.
As a "nudge" in the direction of structured programming, A86 offers the IF statement. Suppose you want to conditionally skip around just one instruction. Ordinarily, this would require, for example.
JNZ >L1 ; skip the following move if NZ MOV AX,BX ; make this move only if Z L1: ; this label exists only for the above skip
You may replace the above code with the single line.
IF Z MOV AX,BX
The above line generates exactly the same code as the previous 3 lines -- a conditional jump of the opposite condition, around the statement given in the tail of the IF statement. The statement can be a macro call, giving you the opportunity to skip something more complicated.
You may use any condition that would follow the "J" in a conditional jump instruction, except CXZ, which does not have a reverse condition. The assembler interprets the condition by appending a "J" to the beginning of the condition; so that the symbols "C", "NC", "Z", "NZ", etc. are not reserved by the assembler, and can be defined in other contexts.
A86 will accept any number of register operands for the instructions PUSH, POP, INC, and DEC; it will generate the appropriate machine instruction for each operand. For example, the statement PUSH AX,BX is the same as the two statements PUSH AX and PUSH BX.
A numeric operand appearing in an INC or DEC statement will cause the previous INC(s) or DEC(s) to be propagated that number of times. For example, the statement INC AX,4 will generate 4 INC AX instructions. The statement DEC AL,BX,2 will generate DEC AL, DEC BX, DEC AL, DEC BX. Sorry, numeric operands are not allowed if any of the operands affected was a forward reference or relocatable quantity; e.g., INC FOO,2 where FOO is undefined. In most such cases, you'll want to code the more efficient ADD FOO,2 anyway.
A86 will accept a numeric operand to the string instructions STOSB, STOSW, MOVSB, and MOVSW (plus STOSD and MOVSD for A386). This causes A86 to generate that many copies of the given instruction. For small values (usually 2 through 4), this is more efficient than loading the number into CX and using the REP prefix.
Programmers accustomed to the conditional return instructions of the 8080/Z80 will appreciate the following feature: A86 allows the operand to a conditional jump instruction to be one of the three RET instructions RET, RETF, or IRET. The assembler will find a nearby return instruction of the indicated flavor, and use that as the target for the conditional jump. For example, JZ RET is the replacement for the 8080's RZ return-if-zero instruction. In other 8086 assembly languages, you have to find the nearby instruction yourself, attach a label to it, and use that label. Note that it does not suffice to attach a label to a single RET instruction and use that label throughout the program: the range of conditional jumps is only 128 bytes in either direction.
What happens if A86 does not find a nearby return instruction? In that case, A86 issues an error, "02 Jump > 128", for the next matching return instruction in the program. If there is no subsequent return instruction, the return mnemonic will appear as an undefined symbol at the end of the program. In either case, you correct the problem by inserting a free-standing return instruction at some nearby point in the program, where it will not affect the existing code (typically following an unconditional JMP instruction). If there is no good place to insert a return instruction, you can always replace the "Jcond RET" with an "IF cond RET".(pg 22)
There are a number of MOV and XCHG instructions available in A86 that are not a part of the machine instruction set.
First, moves between segment registers, and of immediate constants into segment registers are allowed. For example, if you code MOV ES,DS , the assembler will generate a PUSH DS followed by a POP ES; which will effect the move that you intended. If you code MOV DS,0 , the assembler will generate PUSH AX; MOV AX,0; MOV DS,AX; POP AX. This is mainly a convenience for D86 users to load segment registers manually.
Second, MOV allows 3 operands. A statement MOV x,y,z is equivalent to the two statements MOV y,z followed by MOV x,y. Sorry, but segment overrides are not allowed in conjunction with 3-operand MOVs. The override preceding the MOV is ambiguous in its meaning; and overrides within operands cannot be handled correctly by A86. You'll have to code two MOV instructions if you want either or both to have a segment override.
Third, A86 accepts a MOV of a word-sized memory operand into another word-sized memory operand. A86 handles this the same way it handles a MOV of segment registers: it generates a PUSH of the source followed by a POP of the destination.
Finally, A86 allows the XCHG of a segment register (except CS) with any other word-sized quantity, as well as the XCHG of two word-sized memory quantities. If there is no machine instruction available for XCHG a,b, then A86 generates PUSH a followed by MOV a,b followed by POP b.
If you examine most assembly language program symbol tables, you will find that the symbols can be partitioned into two levels of significance. About half the symbols are the names of procedures and variables having global significance. If the names of these symbols are chosen intelligently and carefully, the program's readability improves drastically. (They usually aren't chosen well, most often because the assembler restricts symbols to 6 letters, or because the programmer's habits are influenced by such assemblers.)
The other half of the symbols in a program have a much lower, local significance. They are only place markers used to implement small loops and local branching (e.g., "skip the next 2 instructions if the Z-flag is set"). Assigning full-blown names to these symbols reduces the readability of your program in two ways: First, it is harder to recognize local jumps for what they are -- they are usually the assembly language equivalent of high level language constructs like IF statements and WHILE loops.
Second, it is harder to follow the global, significant symbols because they are buried in a sea of the place marker symbols in the symbol table.
A86 solves this problem with local symbols. If a symbol in your program consists of a single letter followed by one or more decimal digits (L3, X123, Y37, etc.), then the symbol is a local symbol. Local symbols do not appear in the A86 +X cross-reference listing. They can also be redefined to something completely different later in the program. Local symbols can be of any type: labels, memory variables, etc.
Because local symbols can be redefined, you must take care to specify which one you are referring to in your program. If your reference is a forward reference (the label occurs further down in the program from the reference), then the reference must be preceded by a ">". For example,
L2: MOVSB INC BX LOOP L2 ; lack of ">" means L2 is above this statement . . JNZ >L2 ; ">" indicates L2 is below this statement . JMP >L2 ; JMP L2 is disallowed here: cannot overlap ranges . L2.
(pg 23) I recommend that you assign all your local labels the names L0 through L9. If your program is so complex that it needs more than 10 place holders in any one stretch of code, then that stretch needs to be rewritten.
Those of you who have examined 86 family opcodes with an eagle eye will have noticed a somewhat spurious "0A" opcode generated after every AAM or AAD instruction. The opcode is there to provide the constant divisor or multiplicand for the instruction. Believe it or not, there wasn't enough room in the microcode of the original 8086 to hold this constant! Although Intel has never announced the generality of AAM and AAD, it is there: you can substitute any other constant for 0A (decimal 10), and that constant will be used. A86 supports this by letting you give a constant byte-sized operand to AAM or AAD. Particularly useful are the instructions AAM 16, which unpacks AL into nibbles AH and AL; and AAD 16, which reverses the process, packing nibbles AH and AL into AL.
WARNING: A couple of my users point out to me that the AAD instruction with a general operand won't work on the NEC V20 and V30 chips. The operand is assumed to be 10 no matter what it really is. Since a large number of PC "speed up" kits involve switching to NEC chips, this will be seen on many PC's. You should not use AAD with an operand if you want your program to run on everybody's machine. Too bad. AAM works fine, though.
A86 allows the TEST instruction to have a single operand, to set the flags according to the value of the operand. If the operand is a register, A86 generates a TEST of the register with itself. If the operand is a memory quantity, A86 generates a TEST of the memory with the constant -1 (i.e., the quantity will be ANDed with an all 1's constant). For example, instead of TEST DL,DL, you can code simply TEST DL. Instead of TEST WVAR,0FFFF, you can code simply TEST WVAR.
Many assembly-language programmers are in the habit of using, for example, LEA SI,MEMLOC instead of the equivalent MOV SI,OFFSET MEMLOC to load an immediate value that represents the pointer to a memory location. However, the LEA instruction form generates one more byte of object code than the MOV form. A86 recognizes this situation and generates the more-efficient MOV instruction when it can. This also applies to register moves: MOV AX,BX instead of LEA AX,[BX].
I've gotten a little flak from some users about this feature. They claim it violates my policy against "behind your back" actions. But I feel that this feature is completely equivalent to code optimizations in other situations: the short JMP form instead of the equivalent near JMP; a byte operand to ADD SI,4 instead of a word operand; the one-byte XCHG AX,BX instead of the general XCHG rw,ew form; etc, etc, etc. In situations where there is absolute functional equivalence between forms, A86 tries to generate the most efficient form. But for those who are not convinced, I offer the +G2 switch, described in Chapter 3.
Some users have also gotten the mistaken impression, from reading Intel's confusing specs, that the longer LEA is sometimes faster than the shorter MOV. This is never the case: those users are reading the clock counts for the memory-fetch forms of MOV, not the register-only or immediate-value forms. If you don't believe it, try timing 1000 consecutive LEA's in a loop that executes 50000 times, vs. a similar loop with the equivalent MOV.
In this chapter we discuss in detail the instruction set supported by both the A86 and A386 assemblers. To use any of the 32-bit registers, the extra segment regsiters FS and GS, or the instructions marked with a 3, 4, 5 or 6 in the instruction list, you need my A386 assembler, available only if you purchase the registered extended package.
Most memory data accessing in the 86 family is accomplished via the mechanism of the effective address. Wherever an effective address specifier "eb", "ew", "ed", or "ev" appears in the list of instructions, you may use a wide variety of actual operands in that instruction. These include general registers, memory variables, and a variety of indexed memory quantities.
DATA_PTR DW ? ESC_CHAR DB ?
Later, you can load or store these variables.
MOV ESC_CHAR,BL ; store the byte variable ESC_CHAR MOV DATA_PTR,081 ; initialize DATA_PTR MOV SI,DATA_PTR ; load DATA_PTR into SI for use LODSW ; fetch the word pointed to by DATA_PTR
Alternatively, you can address specific unnamed memory locations by enclosing the location value in square brackets; for example,
MOV AL,[02000] ; load contents of location 02000 into AL
Note that A86 discerned from context (loading into AL) that a BYTE at 02000 was intended. Sometimes this is impossible, and you must specify byte or word.
INC B[02000] ; increment the byte at location 02000 MOV W[02000],0 ; set the WORD at location 02000 to zero
MOV AX,[BX] MOV CX,W[SI+17] MOV AX,[BX+SI+5] MOV AX,[BX][SI]5 ; another way to write the same instruction
Or, indexing can be accomplished by declaring variables in a based structure (see the STRUC directive in Chapter 9).
STRUC [BP] ; NOTE: based structures are unique to A86! BP_SAVE DW ? ; BP_SAVE is a word at [BP] RET_ADDR DW ? ; RET_ADDR is a word at [BP+2] PARM1 DW ? ; PARM1 is a word at [BP+4] PARM2 DW ? ; PARM2 is a word at [BP+6](pg 25) ENDS ; end of structure INC PARM1 ; equivalent to INC W[BP+4]
Finally, indexing can be done by mixing explicit components with declared ones.
TABLE DB 4,2,1,3,5 MOV AL,TABLE[BX] ; load byte number BX of TABLE
The 386 and later processors also support indexing using any of the eight 32-bit general registers. This type of indexing is of limited use for memory referencing from real-mode programs (most programs running under DOS), since offsets greater than 64K are disallowed in real mode (you will get a General Protection Fault if you try it). 32-bit indexing is, however, useful in conjunction with the LEA instruction, giving an extremely powerful register arithmetic instruction. For example, LEA ECX,[EAX+2*EBX+17000] performs two additions and a multiplication, all in a single machine instruction. Since no memory access is actually attempted, this kind of LEA usage is allowed in real-mode DOS programs.
In 32-bit indexing, you may use one or two of any of the 32-bit general registers. You may also scale one of the indexing registers, by multiplying it by 2, 4, or 8. You may also add or subtract a constant of any size up to a doubleword capacity to the indexed quantity. If you use the same register twice and scale one of the instances of that register, you get, in effect, an odd-number scaling (3, 5, or 9) of that register; e.g., A386 will allow LEA EAX,[9*EBX] as an abbreviation for LEA EAX,[8*EBX+EBX].
Due to coding restrictions, the ESP register can be used only once within an indexed quantity, and cannot be scaled.
Some more examples of 32-bit indexing are.
XCHG DX,[EAX] MOV AL,[EAX+EBX] ADD EBX,[ESI+8*ECX+3391811] LEA ECX,[4*EBX-7]
The 86 family has four segment registers, CS, DS, ES, and SS, used to address memory. The 386 and later processors add two more segment registers FS and GS. Each segment register points to 64K bytes of memory within the 1-megabyte memory space of the 86. (The start of the 64K is calculated by multiplying the segment register value by 16; i.e., by shifting the value left by one hex digit.) If your program's code, data and stack areas can all fit in the same 64K bytes, you can leave all the segment registers set to the same value. In that case, you won't have to think about segment registers: no matter which one is used to address memory, you'll still get the same 64K. If your program needs more than 64K, you must point one or more segment registers to other parts of the memory space. In this case, you must take care that your memory references use the segment registers you intended.
Each effective address memory access has a default segment register, to be used if you do not explicitly specify which segment register you wish. For most effective addresses, the default segment register is DS. The exceptions are those effective addresses that use the BP register for indexing. All BP-indexed memory references have a default of SS. (This is because BP is intended to be used for addressing local variables, stored on the stack.)
If you wish your memory access to use a different segment register, you provide a segment override byte before the instruction containing the effective address operand. In the A86 language, you code the override by giving the name of the segment register you wish before the instruction mnemonic. For example, suppose you want to load the AL register with the memory byte pointed to by BX. If you code MOV AL,[BX], the DS register will be used to determine which 64K segment BX is pointing to. If you want the byte to come from the CS-segment instead, you code CS MOV AL,[BX]. Be aware that the segment override byte has effect only upon the single instruction that follows it. If you have a sequence of instructions requiring overrides, you must give an override byte before every instruction in the sequence. (In that case, you may wish to consider changing the value of the default segment register for the duration of the sequence.) (pg 26)
NOTE: This method for providing segment overrides is unique to the A86 assembler! The assemblers provided by Intel and IBM (MS-DOS) attempt to figure out segment allocation for you, and plug in segment override bytes "behind your back". In order to do this, those assemblers require you to inform them which variables and structures are pointed to by which segment registers. That is what the ASSUME directive in those assemblers is all about. I wrote Intel's first 86 assembler, ASM86, so I have been watching the situation since day one. Over the years, I have concluded that the ASSUME mechanism creates far, far more confusion that it solves. So I scrapped it; and the result is an assembler with far less red tape. But if your program needs more than 64K, you do have to manage those segment registers yourself; so take care!
Remember that all of the common instructions of the 86 family allow effective addresses as operands. (The only major functions that don't are the AL/AX specific ones: multiply, divide, and input/output). This means that you don't have to funnel many numbers through AL or AX just to do something with them. You can perform all the common arithmetic, PUSH/POP, and MOVes from any general register to any general register; from any memory location (indexed if you like) to any register; and (this is most often overlooked) from any register TO memory. The only thing you can't do in general is memory-to-memory. Among the more common operations that inexperienced 86 programmers overlook are.
This section outlines the number of program opcode bytes generated by effective-address specifications. This will let you make judgments when trying to keep your program as small as possible. The precise opcodes generated are explained in the text files EFF86.TXT in the A86 package, and EFF386.TXT in the A386 package.
Every instruction with an 16-bit effective address has an encoded byte, known as the effective address byte, following the instruction's main opcode. (For obscure reasons, Intel calls this byte the ModRM byte.) If the effective address is a memory variable, or an indexed memory location with a non-zero constant offset, then the effective address byte is immediately followed by the offset amount. Amounts in the range -128 to +127 are given by a single signed byte. Amounts outside that range are represented by a 2-byte offset.
In the instruction chart given later in this chapter, effective-address specification opcodes are denoted by a slash / followed either by the letter "r" or an octal digit. The meaning of the r-or-digit is explained in the EFF*.DOC files. For example, the instruction DIV CX falls under the DIV eb form in the instruction chart. The instruction occupies two bytes: the main opcode byte 0F6H, followed by a single effective address byte with no constant offsets involved. Similarly, the instruction DIV B[BX] occupies two bytes. For DIV B[BX+7] you must add an offset byte for the 7, making a total of three bytes. For DIV B[BX+1000] you must add a 2-byte offset for the 1000, making a total of 4 bytes. For DIV B[02000] (more typically coded with a symbolic name such as DIV MY_VAR_NAME), the instruction is also 4 bytes: the main opcode byte, the effective address byte, and the offset of the memory variable.(pg 27)
An anomalous case is the operand [BP]. The effective-address byte encoding for this particular operand was usurped by the simple-variable case. When A86 sees [BP], it must specify an 8-bit offset whose value is zero. Thus, the instruction DIV B[BP] occupies three bytes, not two. This anomaly does not apply to [BP+SI] or [BP+DI].
In A386, 32-bit indexing is signalled by a special address override opcode byte (67H) preceding the instruction. Following the override byte is the instruction's main opcode, followed by the effective-address specification. For a simple memory variable, the specification consists of a single effective-address byte followed by the 4-byte offset of the variable. For indexing involving a single, non-scaled index register other than ESP, the specification consists of a single byte followed by the constant offset component. For indexing involving two registers, scaling, or the ESP register, there are two bytes followed by the constant offset component. The constant offset component occpies no space if the the offset is zero, one byte if between -128 to +127, and 4 bytes otherwise. There is no provision for a 16-bit-word-sized offset if you are using 32-bit indexing.
Note the distinction between the address override byte (67H) and the operand override byte (66H). A86 must supply an address override when the instruction involves a memory operand whose address has 32 bits. A86 must supply an operand override when the data being manipulated has 32 bits. In general, when a 32-bit register name appears inside the square brackets, that's an address override; when it appears outside the square brackets, that's an operand override. Examples.
MOV DX,[BX] ; needs neither override in a 16-bit segment MOV DX,[EBX] ; needs an address override MOV EDX,[BX] ; needs an operand override MOV EDX,[EBX] ; needs both overrides
Also note that the generation of these override bytes is handled automatically by A86 when it scans the operands to an instruction. The only exceptions to this are the no-operand string operations: REP MOVSW, LODSD, SCASB, etc. For these instructions, the operand size is signalled by the last letter (B, W, or D) of the mnemonic; however, the addressing mode is not signalled by the mnemonic. If you are in 16-bit mode, as all simple DOS programs are, you need to precede a string instruction with an explicit A4 prefix if you wish to use 32-bit addressing ([ESI] and/or [EDI] with count ECX). If you are assembling to a 32-bit protected-mode segment (when that is implemented) you will need to use an explicit A2 prefix if you wish to use 16-bit addressing ([SI] and/or [DI] with count CX).
Here are some examples of instruction size involving 32-bit indexing in a real-mode segment: DIV B[EBX] requires an address override byte, the single instruction opcode byte 0F6H, and an effective address byte: total 3 bytes. DIV B[EBX+7] adds the offset byte 07, making the total 4 bytes. DIV B[EBX+1000] forces the offset to be 4 bytes, making the total 7 bytes. DIV B[EBX+EDI*2] does not require an offset, but the extra index register expands the effective address specifier to two bytes, making the total 4 bytes. Similarly, DIV B[ESP] requires two effective address bytes (total 4 instruction bytes), because the ESP register is a special case. Finally, DIV ES:D[EBX+EDI*2+1000] requires three overrides (segment override ES, operand override for the D, and address override for 32-bit indexing), the main opcode byte, two effective address opcode bytes, and a 4-byte offset: total 10 bytes.
The [BP] extra-byte anomaly applies, in 32-bit mode, to [EBP] as well. In fact, the anomaly also applies when another indexing register (scaled or not) is added to [EBP]. A386 must generate an offset byte whose value is 0 when it sees any no-offset forms involving [EBP].(pg 28)
The 386 and later processors, when running in protected mode, allow segments whose default word-size is 32 bits instead of 16-bits. In such segments, the usage of the operand and address override bytes is reversed: 32-bit operands do not require the operand-override byte, and 16-bit operands do. (8-bit operands never require an operand-override byte.) 32-bit memory addresses do not require an address-override byte; 16-bit addresses do. This mode is recognized by A386 whenever the USE32 directive is used. All DOS programs, which run in real mode, have a default of 16 bits.
The following chart summarizes the machine instructions you can program with A86. In order to use the chart, you need to learn the meanings of the specifiers (each given by 2 lower case letters) that follow most of the instruction mnemonics. Each specifier indicates the type of operand (register byte, immediate word, etc.) that follows the mnemonic to produce the given opcodes. The "v" type, for A86, is the same as "w" -- it denotes a 16-bit word. On A386, "v" denotes either a word or doubleword, depending on the presence of an operand override prefix byte.
NOTE: The following chart gives all instructions for all processors through the Pentium Pro. You must take care to use only the instructions appropriate for the target processor of your program (the P switch will enforce this for you: see Chapter 3). If an instruction form does not run on all processors, there is a letter or digit just before the description field. "N" means the instruction runs only on NEC processors (which are rare nowdays). A digit x means the instruction runs on the x86 or later: 1 for 186, 2 for 286, 3 for 386, 4 for 486, 5 for Pentium, 6 for Pentium Pro. Instructions with 3 or greater are recognized only by my A386 assembler, received only by those who purchase the extended registered package.
The 86 Instruction Set Opcodes Instruction CPU Description 67 or nil A2 (prefix) 3 Use 16-bit address (indexing) in next instruction 67 or nil A4 (prefix) 3 Use 32-bit address (indexing) in next instruction 37 AAA ASCII adjust AL (carry into AH) after addition D5 0A AAD ASCII adjust before division (AX = 10*AH + AL) D4 0A AAM ASCII adjust after multiply (AL/10: AH=Quo AL=Rem) 3F AAS ASCII adjust AL (borrow from AH) after subtraction 14 ib ADC AL,ib Add with carry immediate byte into AL 15 iv ADC eAX,iv Add with carry immediate vword into eAX 80 /2 ib ADC eb,ib Add with carry immediate byte into EA byte 10 /r ADC eb,rb Add with carry byte register into EA byte 83 /2 ib ADC ev,ib Add with carry immediate byte into EA vword 81 /2 iv ADC ev,iv Add with carry immediate vword into EA vword 11 /r ADC ev,rv Add with carry vword register into EA vword 12 /r ADC rb,eb Add with carry EA byte into byte register 13 /r ADC rv,ev Add with carry EA vword into vword register 04 ib ADD AL,ib Add immediate byte into AL 05 iv ADD eAX,iv Add immediate vword into eAX 80 /0 ib ADD eb,ib Add immediate byte into EA byte 00 /r ADD eb,rb Add byte register into EA byte 83 /0 ib ADD ev,ib Add immediate byte into EA vword 81 /0 iv ADD ev,iv Add immediate vword into EA vword 01 /r ADD ev,rv Add vword register into EA vword 02 /r ADD rb,eb Add EA byte into byte register 03 /r ADD rv,ev Add EA vword into vword register 0F 20 ADD4S N Add CL nibbles BCD, DS:SI into ES:DI (CL even,NZ) 24 ib AND AL,ib Logical-AND immediate byte into AL 25 iv AND eAX,iv Logical-AND immediate vword into eAX 80 /4 ib AND eb,ib Logical-AND immediate byte into EA byte 20 /r AND eb,rb Logical-AND byte register into EA byte 83 /4 ib AND ev,ib Logical-AND immediate byte into EA vword 81 /4 iv AND ev,iv Logical-AND immediate vword into EA vword 21 /r AND ev,rv Logical-AND vword register into EA vword 22 /r AND rb,eb Logical-AND EA byte into byte register 23 /r AND rv,ev Logical-AND EA vword into vword register 63 /r ARPL ew,rw 2 Adjust RPL of EA word not smaller than RPL of rw 62 /r BOUND rv,m2v 2 INT 5 if rw not between 2 vwords at [m] inclusive 0F BC/r BSF rv,ev 3 Set rv to lowest position of NZ bit in ev 0F BD/r BSR rv,ev 3 Set rv to highest position of NZ bit in ev 0F C8+r BSWAP rd 4 Swap bytes 1,4 and 2,3 of dword register 0F BA/4 ib BT rv/m,ib 3 Set Carry flag to bit #ib of array at rv/m 0F A3/r BT rv/m,rv 3 Set Carry flag to bit #rv of array at rv/m 0F BA/7 ib BTC rv/m,ib 3 Set CF to, then compl bit ib of array at rv/m 0F BB/r BTC rv/m,rv 3 Set CF to, then compl bit rv of array at rv/m (pg 30) 0F BA/6 ib BTR rv/m,ib 3 Set CF to, then reset bit ib of array at rv/m 0F B3/r BTR rv/m,rv 3 Set CF to, then reset bit rv of array at rv/m 0F BA/5 ib BTS rv/m,ib 3 Set CF to, then set bit ib of array at rv/m 0F AB/r BTS rv/m,rv 3 Set CF to, then set bit rv of array at rv/m 9A cp CALL cp Call far segment, immediate 4- or 6-byte address E8 cv CALL cv Call near, offset relative to next instruction FF /3 CALL ep Call far segment, address at EA memory location FF /2 CALL ev Call near, offset absolute at EA vword 0F FF ib CALL80 ib N Call 8080-emulation code at INT number ib 98 CBW Convert byte into word (AH = top bit of AL) 99 CDQ 3 Convert dword to qword (EDX = top bit of EAX) F8 CLC Clear carry flag FC CLD Clear direction flag so SI and DI will increment FA CLI Clear interrupt enable flag; interrupts disabled 0F 12/0 CLRBIT eb,CL N Clear bit CL of EA byte 0F 13/0 CLRBIT ew,CL N Clear bit CL of EA word 0F 1A/0 ib CLRBIT eb,ib N Clear bit ib of EA byte 0F 1B/0 ib CLRBIT ew,ib N Clear bit ib of EA word 0F 06 CLTS 2 Clear task switched flag F5 CMC Complement carry flag 0F 4n /r CMOVcnd rv,ev 6 Move if condition (see Jcond, all conditions except eCXZ)) 3C ib CMP AL,ib Subtract immediate byte from AL for flags only 3D iv CMP eAX,iv Subtract immediate vword from eAX for flags only 80 /7 ib CMP eb,ib Subtract immediate byte from EA byte for flags only 38 /r CMP eb,rb Subtract byte register from EA byte for flags only 83 /7 ib CMP ev,ib Subtract immediate byte from EA vword for flags only 81 /7 iv CMP ev,iv Subtract immediate vword from EA vword, flags only 39 /r CMP ev,rv Subtract vword register from EA vword for flags only 3A /r CMP rb,eb Subtract EA byte from byte register for flags only 3B /r CMP rv,ev Subtract EA vword from vword register for flags only 0F 26 CMP4S N Compare CL nibbles BCD, DS:SI - ES:DI (CL even,NZ) A6 CMPS mb,mb Compare bytes [SI] - ES:[DI], advance SI,DI A7 CMPS mv,mv Compare vwords [SI] - ES:[DI], advance SI,DI A6 CMPSB Compare bytes DS:[SI] - ES:[DI], advance SI,DI A7 CMPSD 3 Compare dwords DS:[SI] - ES:[DI], advance SI,DI A7 CMPSW Compare words DS:[SI] - ES:[DI], advance SI,DI 0F C7 /1 CMPX8 mq 5 If EDXEAX=mq then mq:=ECXEBX, else EAXEDX:=mq 0F B0 /r CMPXCHG eb,rb 4 If AL=eb then set eb to rb, else set AL to eb 0F B1 /r CMPXCHG ev,rv 4 If eAX=ev then set ev to rv, else set eAX to ev 0F A2 CPUID 5 If EAX=1 set EDXEAX to CPU identification values 99 CWD Convert word to doubleword (DX = top bit of AX) 98 CWDE 3 Sign-extend word AX to doubleword EAX 2E CS (prefix) Use CS segment for the following memory reference 27 DAA Decimal adjust AL after addition 2F DAS Decimal adjust AL after subtraction FE /1 DEC eb Decrement EA byte by 1 FF /1 DEC ev Decrement EA vword by 1 48+rv DEC rv Decrement vword register by 1 F6 /6 DIV eb Unsigned divide AX by EA byte (AL=Quo AH=Rem) F7 /6 DIV ev Unsigned divide eDXeAX by EA vword (eAX=Quo eDX=Rem) 3E DS (prefix) Use DS segment for the following memory reference C8 iw 00 ENTER iw,0 1 Make stack frame, iw bytes local storage, 0 levels C8 iw 01 ENTER iw,1 1 Make stack frame, iw bytes local storage, 1 level C8 iw ib ENTER iw,ib 1 Make stack frame, iw bytes local storage, ib levels 26 ES (prefix) Use ES segment for the following memory reference F(any) Floating point set is in Chapter 7 F4 HLT Halt F6 /7 IDIV eb Signed divide AX by EA byte (AL=Quo AH=Rem) (pg 31) F7 /7 IDIV ev Signed divide eDXeAX by EA vword (eAX=Quo eDX=Rem) F6 /5 IMUL eb Signed multiply (AX = AL * EA byte) F7 /5 IMUL ev Signed multiply (eDXeAX = eAX * EA vword) 0F AF /r IMUL rv,ev 3 Signed multiply EA vword into vword register 6B /r ib IMUL rv,ib 1 Signed multiply imm byte into vword register 69 /r iv IMUL rv,iv 1 Signed multiply imm vword into vword register 69 /r iv IMUL rv,ev,iv 1 Signed multiply (rv = EA vword * imm vword) 6B /r ib IMUL rv,ev,ib 1 Signed multiply (rv = EA vword * imm byte) E4 ib IN AL,ib Input byte from immediate port into AL EC IN AL,DX Input byte from port DX into AL E5 ib IN eAX,ib Input vword from immediate port into eAX ED IN eAX,DX Input vword from port DX into eAX FE /0 INC eb Increment EA byte by 1 FF /0 INC ev Increment EA vword by 1 40+rv INC rv Increment vword register by 1 6C INS eb,DX 1 Input byte from port DX into [DI], advance DI 6D INS ev,DX 1 Input vword from port DX into [DI], advance DI 6C INSB 1 Input byte from port DX into ES:[DI], advance DI 6D INSD 3 Input dword from port DX into ES:[DI], advance DI 6D INSW 1 Input vword from port DX into ES:[DI], advance DI CC INT 3 Interrupt 3 (trap to debugger) (far call, with flags CD ib INT ib Interrupt numbered by immediate byte pushed first) CE INTO Interrupt 4 if overflow flag is 1 0F 08 INVD 4 Invalidate the Data Cache without writing 0F 01 /7 INVLPG m 4 Invalidate the TLB Entry that points to m CF IRET Interrupt return (far return and pop flags) CF IRETD 3 Interrupt return (pop EIP, ECS, Eflags) 77 cb JA cb Jump short if above (CF=0 and ZF=0) above=UNSIGNED 73 cb JAE cb Jump short if above or equal (CF=0) 72 cb JB cb Jump short if below (CF=1) below=UNSIGNED 76 cb JBE cb Jump short if below or equal (CF=1 or ZF=1) 72 cb JC cb Jump short if carry (CF=1) E3 cb JCXZ cb Jump short if CX register is zero 74 cb JE cb Jump short if equal (ZF=1) E3 cb JECXZ cb 3 Jump short if ECX register is zero 7F cb JG cb Jump short if greater (ZF=0 and SF=OF) greater=SIGNED 7D cb JGE cb Jump short if greater or equal (SF=OF) 7C cb JL cb Jump short if less (SF>OF) less=SIGNED 7E cb JLE cb Jump short if less or equal (ZF=1 or SF>OF) EB cb JMP cb Jump short (signed byte relative to next instruction) EA cp JMP cp Jump far (4- or 6-byte immediate address) E9 cv JMP cv Jump near (vword offset relative to next instruction) 0F 8n cv Jcond LONG cv 3 Jump, if condition, to offset >127 away FF /4 JMP ev Jump near to EA vword (absolute offset) FF /5 JMP md Jump far (4-byte address in memory doubleword) 76 cb JNA cb Jump short if not above (CF=1 or ZF=1) 72 cb JNAE cb Jump short if not above or equal (CF=1) 73 cb JNB cb Jump short if not below (CF=0) 77 cb JNBE cb Jump short if not below or equal (CF=0 and ZF=0) 73 cb JNC cb Jump short if not carry (CF=0) 75 cb JNE cb Jump short if not equal (ZF=0) 7E cb JNG cb Jump short if not greater (ZF=1 or SF>OF) 7C cb JNGE cb Jump short if not greater or equal (SF>OF) 7D cb JNL cb Jump short if not less (SF=OF) 7F cb JNLE cb Jump short if not less or equal (ZF=0 and SF=OF) 71 cb JNO cb Jump short if not overflow (OF=0) 7B cb JNP cb Jump short if not parity (PF=0) 79 cb JNS cb Jump short if not sign (SF=0) 75 cb JNZ cb Jump short if not zero (ZF=0) (pg 32) 70 cb JO cb Jump short if overflow (OF=1) 7A cb JP cb Jump short if parity (PF=1) 7A cb JPE cb Jump short if parity even (PF=1) 7B cb JPO cb Jump short if parity odd (PF=0) 78 cb JS cb Jump short if sign (SF=1) 74 cb JZ cb Jump short if zero (ZF=1) 9F LAHF Load: AH = flags SF ZF xx AF xx PF xx CF 0F 02 /r LAR rv,ev 2 Load: high(rw) = Access Rights byte, selector ew C5 /r LDS rv,ep Load EA pointer into DS and vword register 8D /r LEA rv,m Calculate EA offset given by m, place in rv C9 LEAVE 1 Set SP to BP, then POP BP (reverses previous ENTER) C4 /r LES rv,ep Load EA pointer into ES and vword register 0F B4 /r LFS rv,ep 3 Load EA pointer into FS and vword register 0F 01 /2 LGDT m 2 Load 6 bytes at m into Global Descriptor Table reg 0F B5 /r LGS rv,ep 3 Load EA pointer into GS and vword register 0F 01 /3 LIDT m 2 Load 6 bytes into Interrupt Descriptor Table reg 0F 00 /2 LLDT ew 2 Load selector ew into Local Descriptor Table reg 0F 01 /6 LMSW ew 2 Load EA word into Machine Status Word F0 LOCK (prefix) Assert BUSLOCK signal for the next instruction 0F 33/r LODBITS rb,rb N Load AX with DS:SI,bit rb (incr. SI,rb), rb+1 bits 0F 3B/0 ib LODBITS rb,ib N Load AX with DS:SI,bit rb (incr. SI,rb), ib+1 bits AC LODS mb Load byte [SI] into AL, advance SI AD LODS mv Load vword [SI] into eAX, advance SI AC LODSB Load byte [SI] into AL, advance SI AD LODSD 3 Load dword [SI] into EAX, advance SI AD LODSW Load word [SI] into AX, advance SI E2 cb LOOP cb noflags DEC CX; jump short if CX>0 E1 cb LOOPE cb noflags DEC CX; jump short if CX>0 and equal (ZF=1) E0 cb LOOPNE cb noflags DEC CX; jump short if CX>0 and not equal E0 cb LOOPNZ cb noflags DEC CX; jump short if CX>0 and ZF=0 E1 cb LOOPZ cb noflags DEC CX; jump short if CX>0 and zero (ZF=1) 0F 03 /r LSL rv,ev 2 Load: rv = Segment Limit, selector ev 0F B2 /r LSS rv,ep 3 Load EA pointer into SS and vword register 0F 00 /3 LTR ew 2 Load EA word into Task Register A0 iv MOV AL,xb Move byte variable (offset iv) into AL A1 iv MOV eAX,xv Move vword variable (offset iv) into eAX 0F 22 /4 MOV CR4,rd 5 Move rd into control register 4 0F 22 /n MOV CRn,rd 3 Move rd into control register n (=0,2, or 3) 0F 23 /n MOV DRn,rd 3 Move rd into debug register n (=0,1,2,3) 0F 23 /n MOV DRn,rd 3 Move rd into debug register n (=6,7) 0F 26 /n MOV TRn,rd 3 Move rd into test register TRn (=6,7) C6 /0 ib MOV eb,ib Move immediate byte into EA byte 88 /r MOV eb,rb Move byte register into EA byte C7 /0 iv MOV ev,iv Move immediate vword into EA vword 89 /r MOV ev,rv Move vword register into EA vword 8C /r MOV ew,segreg Move segment register into EA word B0+rb ib MOV rb,ib Move immediate byte into byte register 8A /r MOV rb,eb Move EA byte into byte register 0F 20 /4 MOV rd,CR4 5 Move control register 4 into rd 0F 20 /n MOV rd,CRn 3 Move control register n (=0,2, or 3) into rd 0F 21 /n MOV rd,DRn 3 Move debug register n (=0,1,2,3) into rd 0F 21 /n MOV rd,DRn 3 Move debug register n (=6,7) into rd 0F 24 /n MOV rd,TRn 3 Move test register TRn (=6,7) into rd B8+rw iv MOV rv,iv Move immediate vword into vword register 8B /r MOV rv,ev Move EA vword into vword register 8E /r MOV segreg,mw Move EA word into segment register (except CS) A2 iv MOV xb,AL Move AL into byte variable (offset iv) A3 iv MOV xv,eAX Move eAX into vword register (offset iv) A4 MOVS mb,mb Move byte [SI] to ES:[DI], advance SI,DI (pg 33) A5 MOVS mv,mv Move vword [SI] to ES:[DI], advance SI,DI A4 MOVSB Move byte DS:[SI] to ES:[DI], advance SI,DI A5 MOVSD 3 Move dword DS:[SI] to ES:[DI], advance SI,DI A5 MOVSW Move word DS:[SI] to ES:[DI], advance SI,DI 0F BF /r MOVSX rd,ew 3 Move word to dword, with sign-extend 0F BE /r MOVSX rv,eb 3 Move byte to vword, with sign-extend 0F B7 /r MOVZX rd,ew 3 Move word to dword, with zero-extend 0F B6 /r MOVZX rv,eb 3 Move byte to vword, with zero-extend 8C /r MOVZX rw,seg 3 Move segment register into EA word F6 /4 MUL eb Unsigned multiply (AX = AL * EA byte) F7 /4 MUL ev Unsigned multiply (eDXeAX = eAX * EA vword) F6 /3 NEG eb Two's complement negate EA byte F7 /3 NEG ev Two's complement negate EA vword NIL (prefix) Special "do-nothing" opcode assembles no code 90 NOP No Operation F6 /2 NOT eb Reverse each bit of EA byte F7 /2 NOT ev Reverse each bit of EA word 0F 16/0 NOTBIT eb,CL N Complement bit CL of EA byte 0F 17/0 NOTBIT ew,CL N Complement bit CL of EA word 0F 1E/0 ib NOTBIT eb,ib N Complement bit ib of EA byte 0F 1F/0 ib NOTBIT ew,ib N Complement bit ib of EA word 66 or nil O2 (prefix) 3 Use 16-bit data operand in the next instruction 66 or nil O4 (prefix) 3 Use 32-bit data operand in the next instruction 0C ib OR AL,ib Logical-OR immediate byte into AL 0D iv OR eAX,iv Logical-OR immediate word into eAX 80 /1 ib OR eb,ib Logical-OR immediate byte into EA byte 08 /r OR eb,rb Logical-OR byte register into EA byte 83 /1 ib OR ev,ib Logical-OR immediate byte into EA word 81 /1 iv OR ev,iv Logical-OR immediate word into EA word 09 /r OR ev,rv Logical-OR word register into EA word 0A /r OR rb,eb Logical-OR EA byte into byte register 0B /r OR rv,ev Logical-OR EA word into word register E6 ib OUT ib,AL Output byte AL to immediate port number ib E7 ib OUT ib,eAX Output word eAX to immediate port number ib EE OUT DX,AL Output byte AL to port number DX EF OUT DX,eAX Output word eAX to port number DX 6E OUTS DX,eb 1 Output byte [SI] to port number DX, advance SI 6F OUTS DX,ev 1 Output word [SI] to port number DX, advance SI 6E OUTSB 1 Output byte DS:[SI] to port number DX, advance SI 6F OUTSD 3 Output dword DS:[SI] to port number DX, advance SI 6F OUTSW 1 Output word DS:[SI] to port number DX, advance SI 1F POP DS Set DS to top of stack, increment SP by 2 07 POP ES Set ES to top of stack, increment SP by 2 0F A1 POP FS 3 Set FS to top of stack, increment SP by 2 0F A9 POP GS 3 Set GS to top of stack, increment SP by 2 8F /0 POP mv Set memory word to top of stack, increment SP by 2 58+rw POP rv Set word register to top of stack, increment SP by 2 17 POP SS Set SS to top of stack, increment SP by 2 61 POPA 1 Pop DI,SI,BP,x,BX,DX,CX,AX (SP value is ignored) 61 POPAD 3 Pop EDI,ESI,EBP,x,EBX,EDX,ECX,EAX (ESP ign.) 9D POPF Set flags register to top of stack, increment SP by 2 9D POPFD 3 Set eflags reg to top of stack, incr SP by 2 0E PUSH CS Set [SP-2] to CS, then decrement SP by 2 1E PUSH DS Set [SP-2] to DS, then decrement SP by 2 06 PUSH ES Set [SP-2] to ES, then decrement SP by 2 0F A0 PUSH FS 3 Set [SP-2] to FS, then decrement SP by 2 0F A8 PUSH GS 3 Set [SP-2] to GS, then decrement SP by 2 6A ib PUSH ib 1 Push sign-extended immediate byte 68 iv PUSH iv 1 Set [SP-v] to immediate vword, then decrement SP by v (pg 34) FF /6 PUSH mv Set [SP-v] to memory vword, then decrement SP by v 50+rv PUSH rv Set [SP-v] to vword register, then decrement SP by v 16 PUSH SS Set [SP-2] to SS, then decrement SP by 2 60 PUSHA 1 Push AX,CX,DX,BX,original SP,BP,SI,DI 60 PUSHAD 3 Push EAX,ECX,EDX,EBX,original ESP,EBP,ESI,EDI 68 id PUSHD id 3 Set [SP-4] to immediate dword, then decrement SP by 4 9C PUSHF Set [SP-2] to flags register, then decrement SP by 2 9C PUSHFD 3 Set [SP-4] to eflags reg, then decr SP by 4 68 iw PUSHW iw 3 Set [SP-2] to immediate word, then decrement SP by 2 D0 /2 RCL eb,1 Rotate 9-bit quantity (CF, EA byte) left once D2 /2 RCL eb,CL Rotate 9-bit quantity (CF, EA byte) left CL times C0 /2 ib RCL eb,ib 1 Rotate 9-bit quantity (CF, EA byte) left ib times D1 /2 RCL ev,1 Rotate v+1-bit quantity (CF, EA word) left once D3 /2 RCL ev,CL Rotate v+1-bit quantity (CF, EA word) left CL times C1 /2 ib RCL ev,ib 1 Rotate v+1-bit quantity (CF, EA word) left ib times D0 /3 RCR eb,1 Rotate 9-bit quantity (CF, EA byte) right once D2 /3 RCR eb,CL Rotate 9-bit quantity (CF, EA byte) right CL times C0 /3 ib RCR eb,ib 1 Rotate 9-bit quantity (CF, EA byte) right ib times D1 /3 RCR ev,1 Rotate v+1-bit quantity (CF, EA word) right once D3 /3 RCR ev,CL Rotate v+1-bit quantity (CF, EA word) right CL times C1 /3 ib RCR ev,ib 1 Rotate v+1-bit quantity (CF, EA word) right ib times 0F 32 RDMSR 5 Read Model Specific Reg #ECX to EDXEAX 0F 33 RDPMC 6 Read Performance Monitoring Counter #ECX to EDXEAX 0F 31 RDTSC 5 Read Time Stamp Counter to EDXEAX F3 REP (prefix) Repeat following MOVS,LODS,STOS,INS, or OUTS CX times 65 REPC (prefix) N Repeat following CMPS or SCAS CX times or until CF=0 F3 REPE (prefix) Repeat following CMPS or SCAS CX times or until ZF=0 64 REPNC (prfix) N Repeat following CMPS or SCAS CX times or until CF=1 F2 REPNE (prfix) Repeat following CMPS or SCAS CX times or until ZF=1 F2 REPNZ (prfix) Repeat following CMPS or SCAS CX times or until ZF=1 F3 REPZ (prefix) Repeat following CMPS or SCAS CX times or until ZF=0 C3 RET Return to caller (near or far, depending on PROC) C2 iw RET iw RET, then pop iw bytes pushed before Call CB RETF Return to far caller (pop offset, then seg) CA iw RETF iw RET (far), pop offset, seg, iw bytes C3 RETN Return to near caller (pop offset only) C2 iw RETN iw RET (near), pop offset, iw bytes pushed before Call D0 /0 ROL eb,1 Rotate 8-bit EA byte left once D2 /0 ROL eb,CL Rotate 8-bit EA byte left CL times C0 /0 ib ROL eb,ib 1 Rotate 8-bit EA byte left ib times D1 /0 ROL ev,1 Rotate 16- or 32-bit EA vword left once D3 /0 ROL ev,CL Rotate 16- or 32-bit EA vword left CL times C1 /0 ib ROL ev,ib 1 Rotate 16 or 32-bit EA vword left ib times 0F 28/0 ROL4 eb N Rotate nibbles: Heb=Leb HAL,Leb=LAL LAL=Heb D0 /1 ROR eb,1 Rotate 8-bit EA byte right once D2 /1 ROR eb,CL Rotate 8-bit EA byte right CL times C0 /1 ib ROR eb,ib 1 Rotate 8-bit EA byte right ib times D1 /1 ROR ev,1 Rotate 16- or 32-bit EA vword right once D3 /1 ROR ev,CL Rotate 16- or 32-bit EA vword right CL times C1 /1 ib ROR ev,ib 1 Rotate 16- or 32-bit EA vword right ib times 0F 2A/0 ROR4 eb N Rotate nibbles: Leb=Heb Heb=LAL AL=eb 0F AA RSM 5 Resume from System Management mode 9E SAHF Store AH into flags SF ZF xx AF xx PF xx CF D0 /4 SAL eb,1 Multiply EA byte by 2, once D2 /4 SAL eb,CL Multiply EA byte by 2, CL times C0 /4 ib SAL eb,ib 1 Multiply EA byte by 2, ib times D1 /4 SAL ev,1 Multiply EA vword by 2, once D3 /4 SAL ev,CL Multiply EA vword by 2, CL times C1 /4 ib SAL ev,ib 1 Multiply EA vword by 2, ib times (pg 35) D0 /7 SAR eb,1 Signed divide EA byte by 2, once D2 /7 SAR eb,CL Signed divide EA byte by 2, CL times C0 /7 ib SAR eb,ib 1 Signed divide EA byte by 2, ib times D1 /7 SAR ev,1 Signed divide EA vword by 2, once D3 /7 SAR ev,CL Signed divide EA vword by 2, CL times C1 /7 ib SAR ev,ib 1 Signed divide EA vword by 2, ib times 1C ib SBB AL,ib Subtract with borrow immediate byte from AL 1D iv SBB eAX,iv Subtract with borrow immediate word from eAX 80 /3 ib SBB eb,ib Subtract with borrow immediate byte from EA byte 18 /r SBB eb,rb Subtract with borrow byte register from EA byte 83 /3 ib SBB ev,ib Subtract with borrow immediate byte from EA word 81 /3 iv SBB ev,iv Subtract with borrow immediate word from EA word 19 /r SBB ev,rv Subtract with borrow word register from EA word 1A /r SBB rb,eb Subtract with borrow EA byte from byte register 1B /r SBB rv,ev Subtract with borrow EA word from word register AE SCAS mb Compare bytes AL - ES:[DI], advance DI AF SCAS mv Compare vwords eAX - ES:[DI], advance DI AE SCASB Compare bytes AL - ES:[DI], advance DI AF SCASD 3 Compare dwords EAX - ES:[DI], advance DI AF SCASW Compare words AX - ES:[DI], advance DI 0F 14/0 SETBIT eb,CL N Set bit CL of EA byte 0F 15/0 SETBIT ew,CL N Set bit CL of EA word 0F 1C/0 ib SETBIT eb,ib N Set bit ib of EA byte 0F 1D/0 ib SETBIT ew,ib N Set bit ib of EA word 0F 9n /r SETcond eb 3 Set EA byte to 1 if condition, 0 if not (all conds except eCXZ) 0F 01 /0 SGDT m 2 Store 6-byte Global Descriptor Table register to m D0 /4 SHL eb,1 Multiply EA byte by 2, once D2 /4 SHL eb,CL Multiply EA byte by 2, CL times C0 /4 ib SHL eb,ib 1 Multiply EA byte by 2, ib times D1 /4 SHL ev,1 Multiply EA word by 2, once D3 /4 SHL ev,CL Multiply EA word by 2, CL times C1 /4 ib SHL ev,ib 1 Multiply EA word by 2, ib times 0F A5/r SHLD ev,rv,CL 3 Set ev to high of ((ev,rv) SHL CL) 0F A4/r ib SHLD ev,rv,ib 3 Set ev to high of ((ev,rv) SHL ib) D0 /5 SHR eb,1 Unsigned divide EA byte by 2, once D2 /5 SHR eb,CL Unsigned divide EA byte by 2, CL times C0 /5 ib SHR eb,ib 1 Unsigned divide EA byte by 2, ib times D1 /5 SHR ev,1 Unsigned divide EA word by 2, once D3 /5 SHR ev,CL Unsigned divide EA word by 2, CL times C1 /5 ib SHR ev,ib 1 Unsigned divide EA word by 2, ib times 0F AD/r SHRD ev,rv,CL 3 Set ev to low of ((rv,ev) SHR CL) 0F AC/r ib SHRD ev,rv,ib 3 Set ev to low of ((rv,ev) SHR ib) 0F 01 /1 SIDT m 2 Store 6-byte Interrupt Descriptor Table register to m 0F 00 /0 SLDT ew 2 Store Local Descriptor Table register to EA word 0F 01 /4 SMSW ew 2 Store Machine Status Word to EA word 36 SS Use SS segment for the following memory reference F9 STC Set carry flag FD STD Set direction flag so SI and DI will decrement FB STI Set interrupt enable flag, interrupts enabled 0F 31/r STOBITS rb,rb N Store AX to ES:DI,bit rb (incr. DI,rb), rb+1 bits 0F 39/0 ib STOBITS rb,ib N Store AX to ES:DI,bit rb (incr. DI,rb), ib+1 bits AA STOS mb Store AL to byte [DI], advance DI AB STOS mv Store eAX to word [DI], advance DI AA STOSB Store AL to byte ES:[DI], advance DI AB STOSD 3 Store EAX to dword ES:[DI], advance DI AB STOSW Store AX to word ES:[DI], advance DI 0F 00 /1 STR ew 2 Store Task Register to EA word 2C ib SUB AL,ib Subtract immediate byte from AL (pg 36) 2D iv SUB eAX,iv Subtract immediate word from eAX 80 /5 ib SUB eb,ib Subtract immediate byte from EA byte 28 /r SUB eb,rb Subtract byte register from EA byte 83 /5 ib SUB ev,ib Subtract immediate byte from EA word 81 /5 iv SUB ev,iv Subtract immediate word from EA word 29 /r SUB ev,rv Subtract word register from EA word 2A /r SUB rb,eb Subtract EA byte from byte register 2B /r SUB rv,ev Subtract EA word from word register 0F 22 SUB4S N Sub CL nibbles BCD, DS:SI - ES:DI (CL even,NZ) A8 ib TEST AL,ib AND immediate byte into AL for flags only A9 iv TEST eAX,iv AND immediate word into eAX for flags only F6 /0 ib TEST eb,ib AND immediate byte into EA byte for flags only 84 /r TEST eb,rb AND byte register into EA byte for flags only F7 /0 iv TEST ev,iv AND immediate word into EA word for flags only 85 /r TEST ev,rv AND word register into EA word for flags only 84 /r TEST rb,eb AND EA byte into byte register for flags only 85 /r TEST rv,ev AND EA word into word register for flags only 0F 10/0 TESTBIT eb,CL N Test bit CL of EA byte, set Z flag 0F 11/0 TESTBIT ev,CL N Test bit CL of EA word, set Z flag 0F 18/0 ib TESTBIT eb,ib N Test bit ib of EA byte, set Z flag 0F 19/0 ib TESTBIT ew,ib N Test bit ib of EA word, set Z flag 0F 00 /4 VERR ew 2 Set ZF=1 if segment can be read, selector ew 0F 00 /5 VERW ew 2 Set ZF=1 if segment can be written to, selector ew 9B WAIT Wait until floating-point operation is completed 0F 09 WBINVD 4 Write Back and Invalidate the Data Cache 0F 30 WRMSR 5 Write EDXEAX to Model Specific Reg #ECX 0F C0 /r XADD eb,rb 4 Exchange eb with rb then add into new eb 0F C1 /r XADD ev,rv 4 Exchange ev with rv then add into new ev 9r XCHG eAX,rv Exchange eAX with vword register 86 /r XCHG eb,rb Exchange byte register with EA byte 87 /r XCHG ev,rv Exchange vword register with EA vword 86 /r XCHG rb,eb Exchange EA byte with byte register 9r XCHG rv,eAX Exchange eAX with vword register 87 /r XCHG rv,ev Exchange EA vword with vword register D7 XLAT mb Set AL to memory byte [BX + unsigned AL] D7 XLATB Set AL to memory byte DS:[BX + unsigned AL] 34 ib XOR AL,ib Exclusive-OR immediate byte into AL 35 iv XOR eAX,iv Exclusive-OR immediate word into eAX 80 /6 ib XOR eb,ib Exclusive-OR immediate byte into EA byte 30 /r XOR eb,rb Exclusive-OR byte register into EA byte 83 /6 ib XOR ev,ib Exclusive-OR immediate byte into EA word 81 /6 iv XOR ev,iv Exclusive-OR immediate word into EA word 31 /r XOR ev,rv Exclusive-OR word register into EA word 32 /r XOR rb,eb Exclusive-OR EA byte into byte register 33 /r XOR rv,ev Exclusive-OR EA word into word register
"N" next to the instruction description means that instruction works only on NEC chips. A digit x means that instruction works only on the x86 or later processor. See the note just before the chart. (pg 37)
In this chapter, we'll refer to the various Central Processing Units (CPUs) as the "86". Thus "86" refers to either the 8088, 8086, 80186, 80286, etc. We'll refer to the various coprocessors as the "87". Thus "87" refers to either the 8087, the 287, the 387, the special IIT-2C87 processor, or the floating-point units built into the 486 and beyond.
Most Intel-based computer systems have the facility to work with floating-point numbers. On older computers (the 8088 up through the 386), this facility was provided by a separate coprocessor chip, that one could buy as an option and plug into an extra socket on the computer's motherboard. Starting with the 486, the floating-point processor has been integrated into the main CPU chip. There are three generations of 87 chips: the original 8087 that worked with the 8086/8088; the 287, that worked with the 286; and the 387, that worked with the 386. From a programming standpoint, the 8087 and 287 are nearly identical: the 287 adds the instructions FSETPM and FSTSW AX, and ignores the instructions FENI and FDISI. There is, however, a rather nasty design flaw in the 8087, that was corrected in the 287.
To understand the flaw, you must understand how the 86 and 87 work as coprocessors. Whenever the 86 sees a floating point instruction, it communicates the instruction, and any associated memory operands, to the 87. Then the 86 goes on to its next instruction, operating in parallel with the 87. That's OK, so long as the following instructions don't do one of the following.
If they do, then you must provide an instruction called WAIT (or synonymously FWAIT), which halts the 86 until the 87 is finished. For almost all floating point instructions, it should not be necessary to provide an explicit FWAIT; the 86 ought to know that it should wait. For the 8087, it IS necessary to give an explicit FWAIT before each floating point instruction: that is the flaw.
Because of the flaw, all assemblers supporting the 8087 will silently insert an FWAIT code (hex 9B) before all 87 instructions, except those few (the FN instructions other than FNOP) not requiring the FWAIT. A86 will insert the opcode as well, when it is assembling for the original 8087.
The are three ways to tell A86 whether it is assembling for an 8087 or a 287-or-later processor. First, A86 will use a default for the processor on which it is currently assembling: no .287 for an 8086, 8088, 186, or NEC; .287 for a 286 or later. Second, this can be overridden by the switch +F (the F must be capitalized), to signal that the 287 is the target processor, or -F to specify the 8087. Third, an 8087 setting can be further overridden in the source code, with the directive ".287", compatible with Microsoft's assembler.
When A86 is assembling for the 287 or later, it ceases outputting FWAIT directives that are unnecessary for the 287, ignores the instructions FENI, FDISI, FNENI, and FNDISI, and honors the instructions FSETPM and FSTSW AX.
WARNING: The most common mistake 87 programmers make is to try to read the results of an 87 operation in 86 memory, before the results are ready. At least on my computer, the system often crashes when you do this! If your program runs correctly when single stepped, but crashes when set loose, then chances are you need an extra explicit FWAIT somewhere.
A86 supports two additional coprocessors available for PC-compatibles: the 80387, available for 386-based machines, and the IIT-2C87, a 287-plug-compatible chip that adds a couple of unique instructions. The IIT-2C87 has two extra banks of on-chip 8-number stacks, that can be switched in with the FBANK instruction, and a matrix multiply instruction that uses all three banks as input. (pg 38)
Both chips incorporate the correction to the 8087's FWAIT design flaw, so you can assemble with the .287 directive. The extra instructions for these chips are marked by "387:" and "IIT only:" in the chart at the end of this chapter. In addition, A386 supports some new floating-point instructions for the Pentium Pro processor; these instructions are marked by "P6:".
There is a software package provided with many compilers (Borland's Turbo C and most Microsoft compilers, for example) that emulates the 8087 instruction set. The emulator is very cleverly implemented so that the programmer need not know whether a floating point chip will be available, or whether emulation will be necessary. This is done by having the linker replace all floating point machine instructions with INT calls to certain interrupts, dedicated to emulation. The interrupt handlers interpret the operands to the instructions, and emulate the 8087.
You can tell A86 that the emulator might be used, by providing a +f switch in the invocation line, or in the A86 environment variable (make sure the f is lower case). Since your program will be linked to the emulator, you must be producing an OBJ file, not a COM file, for emulation support to take effect. Whenever a floating point instruction is assembled, A86 will generate an external reference at the opcode for the instruction. Then, if the emulation package is linked with your program, the opcodes will be replaced by the INT calls. If a special non-emulation module is linked, the opcodes will be left alone, and the floating point instructions will be executed directly.
For the later processors (286 and beyond), emulation can be provided that executes when the floating-point instructions themselves are seen, so the +f games are not necessary.
The 87 has its own register set, of 8 floating point numbers occupying 10 bytes each, plus 14 bytes of status and control information. Many of the 87's instructions cause the numbers to act like a stack, much like a Hewlett-Packard calculator. For this reason, the numbers are called the floating point stack.
The standard name for the top element of the floating point stack is either ST or ST(0); the others are named ST(1) through ST(7). Thus, for example, the instruction to add stack element number 3 into the top stack element is usually coded FADD ST,ST(3).
I find this notation painfully verbose. Especially bad are the parentheses, which are hard to type, and which add visual clutter to the program. To alleviate this problem while retaining language compatibility, I name my stack elements simply 0 through 7. I recognize ST as a synonym for 0. I allow expression elements to be concatenated; concatenation is the same as addition. Thus, when A86 sees ST(3), it computes 0+3 = 3. So you can code the old way, FADD ST,ST(3), or you can code the concise way, FADD 0,3 or simply FADD 3.
In general, you use the 87 by loading numbers from 86 memory to the 87 stack (using FLD instructions), calculating on the 87 stack, and storing the results back to 86 memory (using FST and FSTP instructions). There are seven constant numbers built into the 87 instruction set: zero, one, Pi, and four logarithmic conversion constants. These can be loaded using the FLDDZI0, FLD1, FLDPI, FLDL2T, FLDL2E, FLDLG2, and FLDLN2 instructions. All other constants must be declared in, then loaded from, 86 memory. Integer constant words and doublewords can be loaded via FILD. Non-integer constant doubleword, quadwords, and ten-byte numbers can be loaded via FLD.
A86 allows you to declare constants loaded via FLD as floating point numbers, using scientific notation if you like. As an exclusive feature, A86 allows you to use any of the 4 arithmetic functions +, -, *, / in expressions involving floating point numbers. A86 will even do type conversion if one of the two operands is given as an integer; though for clarity I recommend that you always give floating point constants with their decimal point.(pg 39)
A86 offers another exclusive feature: the built-in symbols
PI ratio of circumference to diameter of a circle; 3.14159... L2T log base 2 of 10; 3.321928... L2E log base 2 of the calculus constant e = 2.71828... LG2 log base 10 of 2; 0.301029... LN2 natural log (base e) of 2; 0.693147...
You can use these symbols in expressions, to declare useful constants. For example, you can declare the degrees-to-radians conversion constant.
DEG_TO_RAD DT PI/180.
Yet another exclusive A86 feature is the instruction form FLD constant. This form is intended primarily to facilitate "fooling around" with the 87 when using D86; but it is also useful for quick-and-dirty programs. For example, the instruction FLD 12.3 generates the following sequence of code bytes (without explicitly using the local labels given).
CS FLD T[M1] JMP >M2 M1 DT 12.3 M2.
Obviously, this form is not terrifically efficient: you can always save the JMP by placing the constant outside of the instruction stream; and the CS override might not be needed. But the form is very, very convenient!
NOTE that the preceding 2 sections imply that you can get careless and code, for example, FLD PI when you intended FLDPI. Though the two are functionally equivalent, the first form takes a whopping 17 bytes; and second, only 2 bytes. Be careful!
The list of floating point instructions contains a variety of operand types. Here is a brief explanation of those types.
In the "standard" assembly language, the choice of operands for floating point instructions seems inconsistent to me. For example, to subtract stack i from 0, you must provide two operands; to do the equivalent comparison, you must provide only one operand. A86 smooths out these inconsistencies by allowing more choices for operands: FADD i is equivalent to FADD 0,i. FCOM 0,i is equivalent to FCOM i. The same holds for the other main arithmetic instructions. FXCH 0,i and FXCH i,0 are allowed. So if you wish to retain compatibility with other assemblers, you should use their more restrictive instruction list, not the following one.(pg 40)
Following is the 87 instruction set. The w in the opcode field is the FWAIT opcode, hex 9B, which is suppressed if .287 is selected. Again, "0", "1", and "i" stand for the associated floating point stack registers, not constant numbers! Constant numbers in the descriptions are given with decimal points: 0.0, 1.0, 2.0, 10.0.
Opcode Instruction Description w D9 F0 F2XM1 0 := (2.0 ** 0) - 1.0 w DB F1 F4X4 IIT only: 4 by 4 matrix multiply w D9 E1 FABS 0 := |0| w DE C1 FADD 1 := 1 + 0, pop w D8 C0+i FADD i 0 := i + 0 w DC C0+i FADD i,0 i := i + 0 w D8 C0+i FADD 0,i 0 := i + 0 w D8 /0 FADD mem4r 0 := 0 + mem4r w DC /0 FADD mem8r 0 := 0 + mem8r w DE C0+i FADDP i,0 i := i + 0, pop w DB E8 FBANK 0 IIT only: set bank pointer to default w DB EB FBANK 1 IIT only: set bank pointer to bank 1 w DB EA FBANK 2 IIT only: set bank pointer to bank 2 w DF /4 FBLD mem10d push, 0 := mem10d w DF /6 FBSTP mem10d mem10d := 0, pop w D9 E0 FCHS 0 := -0 9B DB E2 FCLEX clear exceptions DB D0+i FCMOVA 0,i P6: If Above then 0 := i DB C0+i FCMOVAE 0,i P6: If Above or Equal then 0 := i DA C0+i FCMOVB 0,i P6: If Below then 0 := i DA D0+i FCMOVBE 0,i P6: If Below or Equal then 0 := i DA C8+i FCMOVE 0,i P6: If Equal then 0 := i DA D0+i FCMOVNA 0,i P6: If Not Above then 0 := i DA C0+i FCMOVNAE 0,i P6: If Not Above-or-Equal then 0 := i DB C0+i FCMOVNB 0,i P6: If Not Below then 0 := i DB D0+i FCMOVNBE 0,i P6: If Not Below-or-Equal then 0 := i DB C8+i FCMOVNE 0,i P6: If Not Equal then 0 := i DB D8+i FCMOVNU 0,i P6: If Not Unordered then 0 := i DB C8+i FCMOVNZ 0,i P6: If Non-Zero then 0 := i DA D8+i FCMOVU 0,i P6: If Unordered then 0 := i DA C8+i FCMOVZ 0,i P6: If Zero then 0 := i FCMOVcond i P6: All FCMOV forms also accept a single i operand w D8 D1 FCOM compare 0 - 1 w D8 D0+i FCOM 0,i compare 0 - i w D8 D0+i FCOM i compare 0 - i w D8 /2 FCOM mem4r compare 0 - mem4r w DC /2 FCOM mem8r compare 0 - mem8r DB F1 FCOMI P6: compare 0 - 1, setting integer flags DB F0+i FCOMI 0,i P6: compare 0 - i, setting integer flags DB F0+i FCOMI i P6: compare 0 - i, setting integer flags DF F1 FCOMIP P6: compare 0 - 1, setting integer flags, and pop DF F0+i FCOMIP 0,i P6: compare 0 - i, setting integer flags, and pop DF F0+i FCOMIP i P6: compare 0 - i, setting integer flags, and pop w D8 D9 FCOMP compare 0 - 1, pop w D8 D8+i FCOMP 0,i compare 0 - i, pop w D8 D8+i FCOMP i compare 0 - i, pop w D8 /3 FCOMP mem4r compare 0 - mem4r, pop w DC /3 FCOMP mem8r compare 0 - mem8r, pop w DE D9 FCOMPP compare 0 - 1, pop both (pg 41) w D9 FF FCOS 387: 0 := cosine(0) w D9 F6 FDECSTP decrement stack pointer w DB E1 FDISI disable interrupts (.287 ignore) w DE F9 FDIV 1 := 1 / 0, pop w D8 F0+i FDIV i 0 := 0 / i w DC F8+i FDIV i,0 i := i / 0 w D8 F0+i FDIV 0,i 0 := 0 / i w D8 /6 FDIV mem4r 0 := 0 / mem4r w DC /6 FDIV mem8r 0 := 0 / mem8r w DE F8+i FDIVP i,0 i := i / 0, pop w DE F1 FDIVR 1 := 0 / 1, pop w D8 F8+i FDIVR i 0 := i / 0 w DC F0+i FDIVR i,0 i := 0 / i w D8 F8+i FDIVR 0,i 0 := i / 0 w D8 /7 FDIVR mem4r 0 := mem4r / 0 w DC /7 FDIVR mem8r 0 := mem8r / 0 w DE F0+i FDIVRP i,0 i := 0 / i, pop w DB E0 FENI enable interrupts (.287 ignore) w DD C0+i FFREE i empty i w DE /0 FIADD mem2i 0 := 0 + mem4i w DA /0 FIADD mem4i 0 := 0 + mem2i w DE /2 FICOM mem2i compare 0 - mem2i w DA /2 FICOM mem4i compare 0 - mem4i w DE /3 FICOMP mem2i compare 0 - mem2i, pop w DA /3 FICOMP mem4i compare 0 - mem4i, pop w DE /6 FIDIV mem2i 0 := 0 / mem2i w DA /6 FIDIV mem4i 0 := 0 / mem4i w DE /7 FIDIVR mem2i 0 := mem2i / 0 w DA /7 FIDIVR mem4i 0 := mem4i / 0 w DF /0 FILD mem2i push, 0 := mem2i w DB /0 FILD mem4i push, 0 := mem4i w DF /5 FILD mem8i push, 0 := mem8i w DE /1 FIMUL mem2i 0 := 0 * mem2i w DA /1 FIMUL mem4i 0 := 0 * mem4i w D9 F7 FINCSTP increment stack pointer 9B DB E3 FINIT initialize 87 w DF /2 FIST mem2i mem2i := 0 w DB /2 FIST mem4i mem4i := 0 w DF /3 FISTP mem2i mem2i := 0, pop w DB /3 FISTP mem4i mem4i := 0, pop w DF /7 FISTP mem8i mem8i := 0, pop w DE /4 FISUB mem2i 0 := 0 - mem2i w DA /4 FISUB mem4i 0 := 0 - mem4i w DE /5 FISUBR mem2i 0 := mem2i - 0 w DA /5 FISUBR mem4i 0 := mem4i - 0 w D9 C0+i FLD i push, 0 := old i w DB /5 FLD mem10r push, 0 := mem10r w D9 /0 FLD mem4r push, 0 := mem4r w DD /0 FLD mem8r push, 0 := mem8r w D9 E8 FLD1 push, 0 := 1.0 w D9 /5 FLDCW mem2i control word := mem2i w D9 /4 FLDENV mem14 environment := mem14 w D9 EA FLDL2E push, 0 := log base 2.0 of e w D9 E9 FLDL2T push, 0 := log base 2.0 of 10.0 w D9 EC FLDLG2 push, 0 := log base 10.0 of 2.0 w D9 ED FLDLN2 push, 0 := log base e of 2.0 w D9 EB FLDPI push, 0 := Pi w D9 EE FLDZ push, 0 := +0.0 w DE C9 FMUL 1 := 1 * 0, pop (pg 42) w D8 C8+i FMUL i 0 := 0 * i w DC C8+i FMUL i,0 i := i * 0 w D8 C8+i FMUL 0,i 0 := 0 * i w D8 /1 FMUL mem4r 0 := 0 * mem4r w DC /1 FMUL mem8r 0 := 0 * mem8r w DE C8+i FMULP i,0 i := i * 0, pop DB E2 FNCLEX nowait clear exceptions DB E1 FNDISI disable interrupts (.287 ignore) DB E0 FNENI enable interrupts (.287 ignore) DB E3 FNINIT nowait initialize 87 w D9 D0 FNOP no operation DD /6 FNSAVE mem94 mem94 := 87 state D9 /7 FNSTCW mem2i mem2i := control word D9 /6 FNSTENV mem14 mem14 := environment DF E0 FNSTSW AX AX := status word DD /7 FNSTSW mem2i mem2i := status word w D9 F3 FPATAN 0 := arctan(1 / 0), pop w D9 F8 FPREM 0 := REPEAT(0 - 1) D9 F5 FPREM1 387: 0 := REPEAT(0 - 1) IEEE compat. w D9 F2 FPTAN push, 1 / 0 := tan(old 0) w D9 FC FRNDINT 0 := round(0) w DD /4 FRSTOR mem94 87 state := mem94 w DD /6 FSAVE mem94 mem94 := 87 state w D9 FD FSCALE 0 := 0 * (2.0 ** 1) 9B DB E4 FSETPM set protection mode D9 FE FSIN 387: 0 := sine(0) D9 FB FSINCOS 387: push, 1 := sine, 0 := cos(old 0) w D9 FA FSQRT 0 := square root of 0 w DD D0+i FST i i := 0 w D9 /2 FST mem4r mem4r := 0 w DD /2 FST mem8r mem8r := 0 w D9 /7 FSTCW mem2i mem2i := control word w D9 /6 FSTENV mem14 mem14 := environment w DD D8+i FSTP i i := 0, pop w DB /7 FSTP mem10r mem10r := 0, pop w D9 /3 FSTP mem4r mem4r := 0, pop w DD /3 FSTP mem8r mem8r := 0, pop w DF E0 FSTSW AX AX := status word w DD /7 FSTSW mem2i mem2i := status word w DE E9 FSUB 1 := 1 - 0, pop w D8 E0+i FSUB i 0 := 0 - i w DC E8+i FSUB i,0 i := i - 0 w D8 E0+i FSUB 0,i 0 := 0 - i w D8 /4 FSUB mem4r 0 := 0 - mem4r w DC /4 FSUB mem8r 0 := 0 - mem8r w DE E8+i FSUBP i,0 i := i - 0, pop w DE E1 FSUBR 1 := 0 - 1, pop w D8 E8+i FSUBR i 0 := i - 0 w DC E0+i FSUBR i,0 i := 0 - i w D8 E8+i FSUBR 0,i 0 := i - 0 w D8 /5 FSUBR mem4r 0 := mem4r - 0 w DC /5 FSUBR mem8r 0 := mem8r - 0 w DE E0+i FSUBRP i,0 i := 0 - i, pop w D9 E4 FTST compare 0 - 0.0 DD E0+i FUCOM i 387: unordered compare 0 - i DD E1 FUCOM 387: unordered compare 0 - 1 DB E9 FUCOMI P6: unord comp 0 - 1, setting integer flags DB E8+i FUCOMI i P6: unord comp 0 - i, setting integer flags DF E9 FUCOMIP P6: unord comp 0 - 1, setting integer flags, pop (pg 43) DF E8+i FUCOMIP i P6: unord comp 0 - i, setting integer flags, pop DD E8+i FUCOMP i 387: unordered compare 0 - i, pop DD E9 FUCOMP 387: unordered compare 0 - 1, pop DA E9 FUCOMPP 387: unordered compare 0 - 1, pop both 9B FWAIT wait for 87 ready w D9 E5 FXAM C3 -- C0 := type of 0 w D9 C9 FXCH exchange 0 and 1 w D9 C8+i FXCH 0,i exchange 0 and i w D9 C8+i FXCH i exchange 0 and i w D9 C8+i FXCH i,0 exchange 0 and i w D9 F4 FXTRACT push, 1 := expo, 0 := sig w D9 F1 FYL2X 0 := 1 * log base 2.0 of 0, pop w D9 F9 FYL2XP1 0 := 1 * log base 2.0 of (0+1.0), pop
A86 supports a variety of formats for numbers. In non-computer life, we write numbers in a decimal format. There are ten digits, 0 through 9, that we use to describe numbers; and each digit position is ten times as significant as the position to its right. The number ten is called the "base" of the decimal format. Computer programmers often find it convenient to use other bases to specify numbers used in their programs. The most commonly-used bases are two (binary format), sixteen (hexadecimal format), and eight (octal format).
The hexadecimal format requires sixteen digits. The extra six digits beyond 0 through 9 are denoted by the first six letters of the alphabet: A for ten, B for eleven, C for twelve, D for thirteen, E for fourteen, and F for fifteen.
In A86, a number must always begin with a digit from 0 through 9, even if the base is hexadecimal. This is so that A86 can distinguish between a number and a symbol that happens to have digits in its name. If a hexadecimal number would begin with a letter, you precede the letter with a zero. For example, hex A0, which is the same as decimal 160, would be written 0A0.
Because it is necessary for you to append leading zeroes to many hex numbers, and because you never have to do so for decimal numbers, I decided to make hexadecimal the default base for numbers with leading zeroes. Decimal is still the default base for numbers beginning with 1 through 9.
Large numbers can be given as the operands to DD, DQ, or DT directives. For readability, you may freely intersperse underscore characters anywhere with your numbers.
The default base can be overridden, with a letter or letters at the end of the number: B or xB for binary, O or Q for octal, H for hexadecimal, and D or xD for decimal. Examples.
Value | Description |
---|---|
077Q | octal, value is 8*7 + 7 = 63 in decimal notation |
123O | octal if the "O" is a letter: 64 + 2*8 + 3 = 83 decimal |
1230 | decimal 1230: shows why you should use "Q" for octal!! |
01234567H | large constant |
0001_0000_0000_0000_0003R | real number specified in hexadecimal |
100D | superfluous D indicates decimal base |
0100D | hex number 100D, which is 4096 + 13 = 5009 in decimal |
0100xD | decimal 100, since xD overrides the default hex format |
0110B | hex 110B, which is 4096 + 256 + 11 = 4363 in decimal |
0110xB | binary 4+2 = 6 in decimal notation |
110B | also binary 4+2 = 6, since "B" is not a decimal digit |
The last five examples above illustrate why an "x" is sometimes necessary before the base-override letter "B" or "D". If that letter can be interpreted as a hex digit, it is; the "x" forces an override interpretation for the "B" or "D". By the way, the usage of lower case for x and upper case for the following override letter is simply a recommendation; A86 treats upper-and lower-case letters equivalently.
A86 also accepts a "base" of K. The number preceding the K is interpreted as a decimal number which is multplied by 1024. Thus, 2K is 2048, 16K is 16384, etc.
The above-mentioned set of defaults (hex if leading zero, decimal otherwise) can be overridden with the RADIX (or, for compatibility, .RADIX) directive. The RADIX directive consists of the word RADIX followed by a number from 2 to 16. The default base for the number is ALWAYS decimal, regardless of any (or no) previous RADIX commands. The number gives the default base for ALL subsequent numbers, up to (but not including) the next RADIX command. If there is no number following RADIX, then A86 returns to its initial mixed default of hex for leading zeroes, decimal for other leading digits.
As an alternative to the RADIX directive, I provide the D switch, which causes A86 to start with decimal defaults. You can put +D into the A86 command invocation, or into the A86 environment variable. The first RADIX command in the program will override the D switch setting.(pg 45)
Following are examples of radix usage. The numbers in the comments are all in decimal notation.
DB 10,010 ; produces 10,16 if RADIX was not seen yet ; and +D switch was not specified RADIX 10 DB 10,010 ; produces 10,10 RADIX 16 DB 10,010 ; produces 16,16 RADIX 3 ; for Martian programmers in Heinlein novels DB 10,100 ; produces 3,9 RADIX DB 10,010 ; produces 10,16
A86 allows floating point numbers as the operands to DD, DQ, and DT directives. The numbers are encoded according to the IEEE standard, followed by the 87-family coprocessors. The format for floating point constants is as follows: First, there is a decimal number containing a decimal point. There must be a decimal point, or else the number is interpreted as an integer. There must also be at least one decimal digit D, either to the left or right
{Data lost in original at this point} I to the left of the decimal point, or else the decimal point is interpreted as an addition (structure element) operator. Optionally, there may follow immediately after the decimal number the letter E followed by a decimal number. The E stands for "exponent", and means "times 10 raised to the power of". You may provide a + or - between the E and its number. Examples.
0.1 | constant one-tenth |
.1 | the same |
300. | floating point three hundred |
30.E1 | 30 * 10**1; i.e., three hundred |
30.E+1 | the same |
30.E-1 | 30 * 10**-1; i.e., three |
30E1 | not floating point: hex integer 030E1 |
1.234E20 | scientific notation: 1.234 times 10 to the 20th |
1.234E-20 | a tiny number: 1.234 divided by 10 to the 20th |
Most of the operands that you code into your instructions and data initializations will be simple register names, variable names, or constants. However, you will regularly wish to code operands that are the results of arithmetic calculations, performed either by the machine when the program is running (for indexing), or by the assembler (to determine the value to assemble into the program). A86 has a full set of operators that you can use to create expressions to cover these cases. They are given in the "Descriptions of Operators and Specifiers" section later in this chapter.
A number or constant (16-bit number) can be used in most expressions. A label (defined with a colon) is also treated as a constant and so can be used in expressions.
A variable stands for a byte- or word-memory location. You may add or subtract constants from variables; when you do so, the constant is added to the address of the variable. You typically do this when the variable is the name of a memory array.
An index expression consists of a combination of a base register [BX] or [BP], and/or an index register [SI] or [DI], with an optional constant added or subtracted. You will usually want to precede the bracketed expression with B, W, or D; to specify the kind of memory unit (byte, word, or doubleword) you are referring to. The expression stands for the memory unit whose address is the run-time value(s) of the base and/or index registers added to the constant. In addition, A386 allows indexing involving all 8 of the 32-bit registers, with optional scaling by 2, 4, or 8. See the Effective Address section and the beginning of this chapter for more details on indexed memory.
These operators are called the "byte isolation" operators. The operand must evaluate to a 16-bit number. HIGH returns the high order byte of the number; LOW the low order byte.
For example,
MOV AL,HIGH(01234) ; AL = 012 TENHEX EQU LOW(0FF10) ; TENHEX = 010
This operator is a "byte combination" operator. It returns the word whose high byte is the left operand, and whose low byte is the right operand. For example, the expression 3 BY 5 is the same as hexadecimal 0305. The BY operator is exclusive to A86. I added it to cover the following situation: Suppose you are initializing your registers to immediate values. Suppose you want to initialize AH to the ASCII value 'A', and AL to decimal 10. You could code this as two instructions MOV AH,'A' and MOV AL,10; but you realize that a single load into the AX register would save both program space and execution time. Without the BY operator, you would have to code MOV AX,0410A, which disguises the types of the individual byte operands you were thinking about. With BY, you can code it properly: MOV AX,'A' BY 10.
operand + operand
operand . operand
operand PTR operand
operand operand
As shown in the above syntax, addition can be accomplished in four ways: with a plus sign, with a dot operator, with a PTR operator, and simply by juxtaposing two operands next to each other. The dot and PTR operators are provided for compatibility. The dot is used in structure field notation; PTR is used in expressions such as BYTE PTR 0. (See Chapter 12 for recommendations concerning PTR.)
If either operand is a constant, the answer is an expression with the typing of the other operand, with the offsets added. For example, if BVAR is a byte variable, then BVAR + 100 is the byte variable 100 bytes beyond BVAR.
Other examples.
DB 100+17 ; simple addition CTRL EQU -040 MOV AL,CTRL'D' ; a nice notation for control-D! MOV DX,[BP].SMEM ; --where SMEM was in an unindexed structure DQ 10.0 + 7.0 ; floating point addition
operand - operand
The subtraction operator may have operands that are.
The result is an absolute number; the difference between the two operands.
Subtraction is also allowed between floating point numbers; the answer is the floating point difference.
operand * operand
operand / operand
operand MOD operand
You may use the * and / operators with either absolute or floating-point numbers. The MOD operator returns the remainder result of the division of its absolute operands. A386 also allows you to multiply a 32-bit register by a scale factor of 2, 4, or 8 for indexing, as described in Chapter 6. Examples.
CMP AL,2 * 4 ; compare AL to 8 MOV BX,0123/16 ; BX = 012 DT 1.0 / 7.0
operand SHR operand
operand SHL operand
BIT operand
The shift operators will perform a "bit-wise" shift of the operand. The operand will be shifted "count" bits either to the right or the left. Bits shifted into the operand will be set to 0.
The expression "BIT count" is equivalent to "1 SHL count"; i.e., BIT returns the mask of the single bit whose number is "count". The operands must be numeric expressions that evaluate to absolute numbers. Examples.
MOV BX, 0FACBH SHR 4 ; shift right: BX = 0FACH MOV BX, 0FACBH SHL 4 ; shift left: BX = 0ACB0H OR AL,BIT 6 ; AL = AL OR 040; 040 is the mask for bit 6 (pg 47)
operand OR operand
operand XOR operand
operand AND operand
NOT operand
The logical operators may only be used with absolute numbers. They always return an absolute number.
Logical operators operate on individual bits. Each bit of the answer depends only on the corresponding bit in the operand(s).
The functions performed are as follows.
Examples.
11110000xB OR 00110011xB evaluates to 11110011xB 11110000xB XOR 00110011xB evaluates to 11000011xB 11110000xB AND 00110011xB evaluates to 00110000xB NOT 00110011xB evaluates to 11001100xB
! operand
The exclamation-point operator, rather than reversing each individual bit of the operand, considers the entire operand as a boolean variable to be negated. If the operand is non-zero (any of the bits are 1), the answer is 0. If the operand is zero, the answer is 0FFFF (0FFFF_FFFF in A386).
Because ! is intended to be used in conditional assembly expressions (described in Chapter 11), there is also a special action when ! is applied to an undefined name: the answer is the defined value 0FFFF, meaning it is TRUE that the symbol is undefined. Similarly, when ! is applied to some defined quantity other than an absolute constant, the answer is 0, meaning it is FALSE that the operand is undefined.
operand EQ operand
operand LT operand
operand LE operand
operand NE operand
operand GT operand
operand GE operand
The relational operators may have operands that are either both absolute numbers, or both variable names that have the same type. The result of a relational operation is always an absolute number. They return an 8-or 16-bit result of all 1's for TRUE and all 0's for FALSE. Examples.
MOV AL, 3 EQ 0 ; AL = 0 (false) MOV AX, 2 LE 15 ; AX = 0FFFFH (true)
string EQ string
string NE string
string = string
In order to subsume the string comparison facilities offered by MASM's special conditional-assembly directives IFIDN and IFDIF, A86 allows the relational operators EQ and NE to accept string arguments. For this syntax to be accepted by A86, both strings must be bounded using the same delimiter (either single quotes for both strings, or double quotes for both strings). For a match (EQ returns TRUE or NE returns FALSE), the strings must be the same length, and every character must match exactly.
An additional A86-exclusive feature is the = operator, which returns TRUE if the characters of the strings differ only in the bit masked by the value 020. Thus you may use = to compare a macro parameter to a string containing nothing but letters. The comparison will be TRUE whether the macro parameter is upper-case or lower-case. No checking is made to detect non-letters, so if you use = on strings containing non-letters, you may get some false TRUE results. Also, = is accepted when it is applied to non-strings as well -- the corresponding values are interpreted as two-byte strings (4-byte strings in A386), with the 020 bits from each byte masked away before comparison.
memletter operand
operand memletter
where memletter is one of: B W D F Q T P
B, W, D, F, Q, T, and P convert the operand into a byte, word, doubleword, far, quadword, ten-byte variable, and (on A386) a 6-byte far pointer, respectively. The operand can be a constant, or a variable of another type. Examples.(pg 48)
ARRAY_PTR: DB 100 DUP (?) WVAR DW ? MOV AL,ARRAY_PTR B ; load first byte of ARRAY_PTR array into AL MOV AL,WVAR B ; load the low byte of WVAR into AL MOV AX,W[01000] ; load AX with the memory word at loc. 01000 LDS BX,D[01000] ; load DS:BX with the doubleword at loc. 01000 JMP F OUTSIDE_LOC ; jump to undeclared far location OUTSIDE_LOC FLD T[BX] ; load ten-byte number at [BX] to 87 stack
For compatibility, A86 accepts the more verbose synonyms BYTE, WORD, DWORD, FAR, QWORD, and TBYTE for B,W,D,F,Q,T, respectively.
The SHORT operator is used to specify that the label referenced by a JMP instruction is within 127 bytes of the end of the instruction. The LONG operator specifies the opposite: that the label is not within 127 bytes. The appropriate operator can (and sometimes must) be used if the label is forward referenced in the instruction.
When a non-local label is forward referenced, the assembler assumes that it will require two bytes to represent the relative offset of the label (so the instruction including the opcode byte will be three bytes). By correctly using the SHORT operator, you can save a byte of code when you use a forward reference. If the label is not within the specified range, an error will occur. The following example illustrates the use of the SHORT operator.
JMP FWDLAB ; three byte instruction JMP SHORT FWDLAB ; two byte instruction JMP >L1 ; two byte instruction assumed for a local label
Because the assembler assumes that a forward reference local label is SHORT, you may sometimes be forced to override this assumption if the label is in fact not within 127 bytes of the JMP. This is why LONG is provided.
JMP LONG >L9 ; three byte instruction
If you are bothered by this possibility, you can specify the +G switch, which causes A86 to pessimistically generate the three byte JMP for all forward references, unless specifically told not to, with SHORT.
NOTE that for A86, LONG will have effect only on the operand to an unconditional JMP instruction; not to conditional jumps. That is because conditional jumps farther than 127 bytes are available only on the 386 and later processors. If you run into this problem, then chances are your code is getting out of control--time to rearrange, or to break off some of the intervening code into separate procedures. If you insist upon leaving the code intact, you can replace the conditional jump with an "IF cond JMP".
OFFSET is used to convert a variable into the constant pointer to the variable. For example, if you have declared XX DW ?, and you want to load SI with the pointer to the variable XX, you can code: MOV SI,OFFSET XX. The simpler instruction MOV SI,XX moves the variable contents of XX into SI, not the constant pointer to XX.
NEAR converts the operand to have the type of a code label, as if it were defined by appearing at the beginning of a program line with a colon after it. NEAR is provided mainly for compatibility.
[operand]
Square brackets around an operand give the operand a memory variable type. Square brackets are generally used to enclose the names of base and index registers: BX, BP, SI, and DI. When the size of the memory variable can be deduced from the context of the expression, square brackets are also used to turn numeric constants into memory variables. Examples.
MOV B[BX+50],047 ; move imm value 047 into mem byte at BX+50 MOV AL,[050] ; move byte at memory location 050 into AL MOV AL,050 ; move immediate value 050 into AL (pg 49)
const:operand
segreg:operand
segname:operand
The colon operator is used to attach a segment register value to an operand. The segment register value appears to the left of the colon; the rest of the operand appears to the right of the colon.
There are three forms to the colon operator. The first form has a constant as the segment register value. This form is used to create an operand to a far (inter-segment) JMP or CALL instruction. An example of this is the instruction JMP 0FFFF:0, which jumps to the cold-boot reset location of the 86 processor.
The second form has a segment register name to the left of the colon. This is the segment override form, provided for compatibility. A86 will generate a segment override byte when it sees this form, unless the operand to the right of the colon already has a default segment register that is the same as the given override.
I prefer the more explicit method of overrides, exclusive to A86: simply place the segment register name before the instruction mnemonic. For example, I prefer ES MOV AL,[BX] to MOV AL,ES:[BX].
The third form has a segment or group name before the colon. This form is handled in .OBJ mode when there is a group name before the colon, and an external symbol after. In that case, the group override is necessary for the linker to produce correct code. In other cases, the override is not necessary and is ignored by A86.
ST is ignored whenever it occurs in an expression. It is provided for compatibility with Intel and IBM assemblers. For example, you can code FLD ST(0),ST(1), which will be taken by A86 as FLD 0,1.
The REF operator returns a value of -1 (true) if the operand is a symbol that has been referenced, 0 (false) if it hasn't. Appearance as an operand to another REF or DEF does not count as a reference.
The DEF operator returns a value of true if the operand has been defined previously in the assembly, false if it hasn't.
REF and DEF are most often used within parameters to an IF conditional-assembly construct.
The TYPE operator returns 1 if the operand is a byte variable; 2 if the operand is a word variable; 4 if the operand is a doubleword variable; 8 if the operand is a quadword variable; 10 if the operand is a ten-byte variable; 0 if the operand is a constant, and the number of bytes allocated by the structure if the operand is a structure name (see STRUC in the next chapter).
A common usage of the TYPE operator is to represent the number of bytes of a named structure. For example, if you have declared a structure named LINE (as described in the next chapter) that defines 82 bytes of storage, then two ways you might refer to the value symbolically are as follows.
MOV CX,TYPE LINE ; loads the size of LINE into CX DB TYPE LINE DUP ? ; allocates an area of memory for a LINE
THIS returns the value of the current location counter. It is provided for compatibility. The dollar sign $ is the more standard and familiar specifier for this purpose; it is equivalent to THIS NEAR. THIS is typically used with the BYTE and WORD specifiers to create alternate-typed symbols at the same memory location.
BVAR EQU THIS BYTE WVAR DW ?
I don't recommend the use of THIS. If you wish to retain Intel compatibility, you can use the less verbose LABEL directive.
BVAR LABEL BYTE WVAR DW ?
If you are not concerned with compatibility to lesser assemblers, A86 offers a variety of less verbose forms. The most concise is DB without an operand.
BVAR DB WVAR DW ?
(pg 50) If this is too cryptic for you, there is always BVAR EQU B[$].
ELSE default_op new_op ELSE default_op
The ELSE operator is an easy way to provide defaults to macro operands. The first form, with only default_op to the right of ELSE, returns default_op. The second form, with operands on both sides, returns the right operand new_op. For example. if you code MOV AX,#3 ELSE 100 in a macro definition, the instruction will load AX with the third macro operand; with a default of 100 if no such operand is provided in the macro call.
Consider the expression 1+2*3. When A86 sees this expression, it could perform the multiplication first, giving an answer of 1+6=7; or it could do the addition first, giving an answer of 3*3=9. In fact, A86 does the multiplication first, because A86 assigns a higher precedence to multiplication than it does addition.
The following list specifies the order of precedence A86 assigns to expression operators. All expressions are evaluated from left to right following the precedence rules. You may override this order of evaluation and precedence through the use of parentheses ( ). In the example above, you could override the precedence by parenthesizing the addition: (1+2)*3.
Some symbols that we have referred to as operators, are treated by the assembler as operands having built-in values. These include $, and ST. In a similar vein, a segment override term (a segment register name followed by a colon) is recorded when it is scanned, but not acted upon until the entire containing expression is scanned and evaluated. The size operators B, W, D, F, Q, T, and P are also recorded and applied after scanning and evaluation.
If two operators are adjacent, the rightmost operator must have precedence; otherwise, parentheses must be used. For example, the expression BIT ! 1 is illegal because the leftmost operator BIT has the higher precedence of the two adjacent operators BIT and "!". You can code BIT (! 1).
The following discussion applies when A86 is assembling a .COM See the next chapter for the discussion of segmentation for .OBJ files.
A86 views the 86 computer's memory space as having two parts: The first part is the program, whose contents are the object bytes generated by A86 during its assembly of the source. A86 calls this area the CODE SEGMENT. The second part is the data area, whose contents are generated by the program after it starts running. A86 calls this area the DATA SEGMENT.
Please note well that the only difference between the CODE and DATA segments is whether the contents are generated by the program or the assembler. The names CODE and DATA suggest that program code is placed in the CODE segment, and data structures go in the DATA segment. This is mostly true, but there are exceptions. For example, there are many data structures whose contents are determined by the assembler: pointer tables, arrays of pre-defined constants, etc. These tables are assembled in the CODE segment.
In general, you will want to begin your program with the directive DATA SEGMENT, followed by all your program variables and uninitialized data structures, using the directives DB, DW, and STRUC. If you do not give an ORG directive, A86 will begin the allocation immediately following the end of the .COM program. You can end the DATA SEGMENT allocation lines with the DATA ENDS directive, followed by the program code itself. A short program illustrating this suggested usage follows.
DATA SEGMENT ANSWER_BYTE DB ? CALL_COUNT DW ? CODE SEGMENT JMP MAIN TRAN_TABLE: DB 16,3,56,23,0,9,12,7 MAIN: MOV BX,TRAN_TABLE XLATB MOV ANSWER_BYTE,AL INC CALL_COUNT RET
A86 allows you to intersperse CODE SEGMENTs and DATA SEGMENTs throughout your program; but in general it is best to put all your DATA SEGMENT declarations at the top of your program, to avoid problems with forward referencing.
For compatibility with Intel/IBM assemblers, A86 provides the CODE ENDS and DATA ENDS statements. The CODE ENDS statement is ignored; we assume that you have not nested a CODE segment inside a DATA segment. The DATA ENDS statement is equivalent to a CODE SEGMENT statement.
ORG moves the output pointer (the location counter at which assembly is currently taking place within the current segment) to the value of the operand. In the CODE segment, the operand should be an absolute constant, or an expression evaluating to an absolute, non-forward-referenced constant. In the DATA segment, the operand may be a forward reference or an expression containing one or more forward references. All symbols in the segment will be resolved when the forward references to the ORG operand are all resolved.
There is a special side effect to ORG when it is used in the CODE segment. If you begin your code segment with ORG 0, then A86 knows that you are not assembling a .COM program; but are instead assembling a code segment to be used in some other context (examples: programming a ROM, or assembling a procedure for older versions of Turbo Pascal). The output file will start at 0, not 0100 as in a .COM file; and the default extension for the output file will be .BIN, not .COM. However, if you later issue an ORG 0100 directive, the default will revert back to .COM. (pg 52)
Other than in the above example, you should not in general issue an ORG within the CODE segment that would lower the value of the output pointer. This is because you thereby put yourself in danger of losing part of your assembled program. If you re-assemble over space you have already assembled, you will clobber the previously-assembled code. Also, be aware that the size of the output program file is determined by the value of the code segment output pointer when the program stops. If you ORG to a lower value at the end of your program, the output program file will be truncated to the lower-value address.
Again, almost no program producing a .COM file will need any ORG directive in the code segment. There is an implied ORG 0100 at the start of the program. You just start coding instructions, and the assembler will put them in the right place.
The EVEN directive coerces the current output pointer to a value which is an exact multiple of the operand. If no operand is given, a value of 2 is assumed. In a DATA SEGMENT or STRUC, it does so by adding to the current output pointer if necessary. In a code segment, it outputs an appropriate number of NOP instruction bytes. EVEN is most often used in data segments, before a sequence of DW directives. Machines beyond the original 8088 fetch words more quickly when they are aligned onto even addresses; so the EVEN directive insures that your program will have the faster access to those DW's that follow it. Also useful are EVEN 4 for doubleword alignment, and EVEN 16 for paragraph alignment. Be aware, though, that if you use the EVEN directive in .OBJ mode, the containing SEGMENT directive should have an alignment type at least as great as your EVEN operand, to achieve the desired alignment at its final memory location.
The 86 computer family supports the three fundamental data types BYTE, WORD, and DWORD. A byte is eight bits, a word is 16 bits (2 bytes), and a doubleword is 32 bits (4 bytes). In addition, the 87 floating point processor manipulates 8-byte quantities, which we call Q-words, and 10-byte quantities, which we call T-bytes. A386 also allows a data type P for a 6-byte far pointer with a 32-bit offset. The A86 data allocation statement is used to specify the bytes, words, doublewords, Q-words, T-bytes, and 6-byte far pointers which your program will use as data. The syntax for the data allocation statement is as follows.
optional_var_name DB list of values
optional_var_name DW list of values
optional_var_name DD list of values
optional_var_name DQ list of values
optional_var_name DT list of values
optional_var_name DP list of values
The variable name, if present, causes that name to be entered into the symbol table as a memory variable with type BYTE (for DB), WORD (for DW), DWORD (for DD), QWORD (for DQ), TBYTE (for DT), or P (for DP). The variable name should NOT have a colon after it, unless you wish the name to be a label (instructions referring to it will interpret the label as the constant pointer to the memory location, not its contents).
The DB statement is used to reserve bytes of storage; DW is used to reserve words. The list of values to the right of the DB or DW serves two purposes. It specifies how many bytes or words are allocated by the statement, as well as what their initial values should be. The list of values may contain a single value or more than one, separated by commas. The list can even be missing; meaning that we wish to define a byte or word variable at the same location as the next variable. (pg 53)
If the data initialization is in the DATA segment, the values given are ignored, except as place markers to reserve the appropriate number of units of storage. The use of "?", which in .COM mode is a synonym for zero, is recommended in this context to emphasize the lack of actual memory initialization. When A86 is assembling .OBJ files, the ?-initialization will cause a break in the segment (unless ? is embedded in a nested DUP containing non-? terms, in which case it is a synonym for zero).
A special value which can be used in data initializations is the DUP construct, which allows the allocation and/or initialization of blocks of data. The expression n DUP x is equivalent to a list with x repeated n times. "x" can be either a single value, a list of values, or another DUP construct nested inside the first one. The nested DUP construct needs to be surrounded by parentheses. All other assemblers, and earlier versions of A86, require parentheses around all right operands to DUP, even simple ones; but this requirement has been removed for simple operands in the current A86.
Here are some examples of data initialization statements, with and without DUP constructs.
CODE SEGMENT DW 5 ; allocate one word, init. to 5 DB 0,3,0 ; allocate three bytes, init. to 0,3,0 DB 5 DUP 0 ; equivalent to DB 0,0,0,0,0 DW 2 DUP (0,4 DUP 7) ; equivalent to DW 0,7,7,7,7,0,7,7,7,7 DATA SEGMENT XX DW ? ; define a word variable XX YYLOW DB ; no init value: YYLOW is low byte of word var YY YY DW ? X_ARRAY DB 100 DUP ? ; X_ARRAY is a 100-byte array D_REAL DQ ? ; double precision floating variable EX_REAL DT ? ; extended precision floating variable
A character string value, enclosed in either single-quotes or double-quotes, may be used to initialize consecutive bytes in a DB statement. Each character will be represented by its ASCII code. The characters are stored in the order that they appear in the string, with the first character assigned to the lowest-addressed byte. In the statement DB 'HELLO', five bytes are initialized with the ASCII representation of the characters in the string 'HELLO'.
Note that except for string comparisons described in the previous chapter, the DB directive is the only place in your program that strings of length greater than 2 (for A386, length greater than 4) may occur. In all other contexts (including DW), a string is treated as the constant number representing the ASCII value of the string; for example, CMP AL,"@" is the instruction comparing the AL register with the ASCII value of the at-sign. Note further that 2-character string constants, like all constants in the 8086, have their bytes reversed. Thus, while DB 'AB' will produce hex 41 followed by hex 42, the similar looking DW 'AB' reverses the bytes: hex 42 followed by hex 41.
The DD directive is used to initialize 32-bit doubleword pointers to locations in arbitrary segments of the 86's memory space. Values for such pointers are given by two numbers separated by a colon. The segment register value appears to the left of the colon; and the offset appears to the right of the colon. In keeping with the reversed-bytes nature of memory storage in the 86 family, the offset comes first in memory. For example, the statement DD 01234:05678 appearing in a CODE segment will cause the hex bytes 78 56 34 12 to be generated, which is a long pointer to segment 01234, offset 05678.
DD, DQ, and DT can also be used to initialize large integers and floating point numbers. Examples.
DD 500000 ; half million, too big for most of A86, OK for A386 DD 3.5 ; single precision floating point number DQ 3.5 ; the same number in a double precision format DT 3.5 ; the same number in an extended precision format
The STRUC directive is used to define a template of data to be addressed by one of the 8086's base and/or index registers. The syntax of STRUC is as follows.
optional_strucname STRUC optional_effective_address
The optional structure name given at the beginning of the line can appear in subsequent expressions in the program, with the operator TYPE applied to it, to yield the number of bytes in the structure template.
The STRUC directive causes the assembler to enter a mode similar to DATA SEGMENT: assembly within the structure declares symbols (the elements of the structure), using a location counter that starts out at the address following STRUC. If no address is given, assembly starts at location 0. An option not available to the DATA SEGMENT is that the address can include bracketed index registers, in any legal form as described in Chapter 6. For example.
LINE STRUC [BP] ; the template starts at [BP] DB 80 DUP (?) ; these 80 bytes advance us to [BP+80] LSIZE DB ? ; this 1 byte advances us to [BP+81] LPROT DB ? ENDS
The STRUC just given defines the variables LSIZE, equivalent to B[BP+80], and LPROT, equivalent to B[BP+81]. You can now issue instructions such as MOV AL,LSIZE; which automatically generates the correct indexing for you.
The mode entered by STRUC is terminated by the ENDS directive, which returns the assembler to whatever segment (CODE or DATA) it was in before the STRUC, with the location counter restored to its value within that segment before the STRUC was declared.
A86 allows names for a variety of program elements to be forward referenced. This means that you may use a symbol in one statement and define it later with another statement. For example.
JNZ TARGET . . TARGET: ADD AX,10
In this example, a conditional jump is made to TARGET, a label farther down in the code. When JNZ TARGET is seen, TARGET is undefined, so this is a forward reference.
Earlier versions of A86 were much more restricted in the kinds of forward references allowed. Almost all of the restrictions have now been eased, for convenience as well as compatibility with other assemblers. In particular, you may now make forward references to variable names. You just need to see to it that A86 has enough information about the type of the operand to generate the correct instruction. For example, MOV FOO,AL will cause A86 to correctly deduce that FOO is a byte variable. You can even code a subsequent MOV FOO,1 and A86 will remember that FOO was assumed to be a byte variable. But if you code MOV FOO,1 first, A86 won't know whether to issue a byte or a word MOV instruction; and will thus issue an error. You then specify the type by MOV FOO B,1.
In general, A86's compatibility with other assemblers has improved dramatically for forward references. You'll need only sprinkle a very few B's and W's into your references. And you'll be rewarded: in many cases the word form is longer than the byte form, so that other assemblers wind up inserting a wasted NOP in your program. You'll wind up with tighter code by using A86!
A86 now allows you to include any number of forward-reference symbols in expressions of arbitrary complexity. If the expression is legal when the forward references are resolved, then it will be accepted by the assembler.
A86 will also accept the reserved symbol END as a forward-reference quantity, either by itself as an operand, or within an expression. END will be resolved when assembly is complete, as a label pointing to the end of the program.(pg 55)
For example, suppose you wish to advance the ES segment register to point immediately beyond your program. You can code.
MOV AX,CS ; fetch the program's segment value ADD AX,(END+15)/16 ; add in the number of paragraphs MOV ES,AX ; ES is now loaded as desired
The EQU directive allows you to define a symbol to be synonymous with any constant, other symbol, and expression that you wish. To the right of the EQU, you can specify an operand of any type that could appear as an operand to an instruction.
For example, suppose you are writing a program that manipulates a table containing 100 names and that you want to refer to the maximum number of names throughout the source file. You can, of course, use the number 100 to refer to this maximum each time, as in MOV CX,100, but this approach suffers from two weaknesses. First of all, 100 can mean a lot of things; in the absence of comments, it is not obvious that a particular use of 100 refers to the maximum number of names. Secondly, if you extend the table to allow 200 names, you will have to locate each 100 and change it to a 200. Suppose, instead, that you define a symbol to represent the maximum number of names with the following statement.
MAX_NAMES EQU 100
Now when you use the symbol MAX_NAMES instead of the number 100 (for example, MOV CX,MAX_NAMES), it will be obvious that you are referring to the maximum number of names in the table. Also, if you decide to extend the table, you need only change the 100 in the EQU directive to a 200 and every reference to MAX_NAMES will reflect the change.
You could also take advantage of A86's strong typing, by changing MAX_NAMES to a variable.
MAX_NAMES DB ?
or even an indexed quantity.
MAX_NAMES EQU [BX+1]
Because the A86 language is strongly typed, the instruction for loading MAX_NAMES into the CX register remains exactly the same in all cases: simply MOV CX,MAX_NAMES.
A86 allows you to define synonyms for any of the assembler reserved symbols, by EQUating an alternate name of your choosing, to that symbol. For example, suppose you were coding a source module that is to be incorporated into several different programs. In some programs, a certain variable will exist in the code segment. In others, it will exist in the stack segment. You want to address the variable in the common source module, but you don't know which segment override to use. The solution is to declare a synonym, QS, for the segment register. QS will be defined by each program: the code-segment program will have a QS EQU CS at the top of it; the stack-segment program will have QS EQU SS. The source module can use QS as an override, just as if it were CS or SS. The code would be, for example, QS MOV AL,VARNAME.
A86 provides a mnemonic, NIL, that generates no code. NIL can be used as a prefix to another instruction (which will have no effect on that instruction), or it can appear by itself on a line. NIL is provided to extend the example in the previous section, to cover the possibility of no overrides. If your source module goes into a program that fits into 64K, so that all the segment registers have the same value, then code QS EQU NIL at the top of that program.
A86 allows you to equate your own name to an INT instruction with a specific interrupt number. For example, if you place TRAP EQU INT 3 at the top of your program, you can use the name TRAP as a synonym for INT 3 (the debugger trap on the 8086).
A86 contains the unique feature of duplicate definitions. We have already discussed local symbols, which can be redefined to different values without restriction. Local symbols are the only symbols that can be redefined. However, any symbol can be defined more than once, as long as the symbol is defined to be the same value and type in each definition. (pg 56)
This feature has two uses. First, it eases modular program development. For example, if two independently-developed source files both use the symbol ESC to stand for the ASCII code for ESCAPE, they can both contain the declaration ESC EQU 01B, with no problems if they are combined into the same program.
The second use for this feature is assertion checking. Your deliberate redeclaration of a symbol name is an assertion that the value of the symbol has not changed; and you want the assembler to issue you an error message if it has changed. Example: suppose you have declared a table of options in your DATA segment; and you have another table of initial values for those options in your CODE segment. If you come back months later and add an option to your tables, you want to be reminded to update both tables in the same way. You should declare your tables as follows.
DATA SEGMENT OPTIONS: . . OPT_COUNT EQU $-OPTIONS ; OPT_COUNT is the size of the table CODE SEGMENT OPT_INITS: . . OPT_COUNT EQU $-OPT_INITS ; second OPT_COUNT had better be the same!
The equals sign directive is provided for compatibility. It is identical to the EQU directive, with one exception: if the first time a symbol appears in a program is in an = directive, that symbol will be taken as a local symbol. It can be redefined to other values, just like the generic local symbols (letter followed by digits) that A86 supports. (If you try to redefine an EQU symbol to a different value, you get an error message.) The = facility is most often used to define "assembler variables", that change value as the assembly progresses.
PROC is a directive provided for compatibility with Intel/IBM assemblers. I don't like PROC; and I recommend that you do not use it, even if you are programming for those assemblers.
The idea behind PROC is to give the assembler a mechanism whereby it can decide for you what kind of RET instruction you should be providing. If you specify NEAR in your PROC directive, then the assembler will generate a near (same segment) return when it sees RET. If you specify FAR in your PROC directive, the assembler will generate a far RETF return (which will cause both IP and CS to be popped from the stack). If you simply leave well enough alone, and never code a PROC in your program, then RET will mean near return throughout your program.
The reason I don't like PROC is because it is yet another attempt by the assembler to do things "behind your back". This goes against the reason why you are programming in assembly language in the first place, which is to have complete control over the code generated by your source program. It leads to nothing but trouble and confusion.
Another problem with PROC is its verbosity. It replaces a simple colon, given right after the label it defines. This creates a visual clutter in the program, that makes the program harder to read.
A86 provides an explicit RETF mnemonic so that you don't need to use PROC to distinguish between near and far return instructions. You can use RET or RETN for a near return and RETF for a far return.
The only action A86 takes when it sees an ENDP directive is to return the assembler the near/far state it was in before the last PROC directive.(pg 57)
where memletter is one of: B W D F Q T P NEAR
LABEL is another directive provided for compatibility with Intel/IBM assemblers. A86 provides less verbose ways of specifying all the above LABEL forms, except for LABEL FAR.
LABEL defines "name" to have the type given, and a value equal to the current output pointer. Thus, LABEL NEAR is synonymous with a simple colon following the name; and LABEL B (and its synonym LABEL BYTE), LABEL W, LABEL D, etc., are synonymous with DB, DW, DD, etc., with no operands.
LABEL FAR does have a unique functionality, not found in other assemblers. It identifies "name" as a procedure that can be called from outside this program's code segment. Such procedures should have RETFs instead of RETs. Furthermore, I have provided the following feature, unique to A86: if you CALL the procedure from within your program, A86 will generate a PUSH CS instruction followed by a NEAR call to the procedure. Other assemblers will generate a FAR call, having the same functional effect; but the FAR call consumes more program space, and takes more time to execute.
WARNING: you cannot use the above CALL feature as a forward reference; the LABEL FAR definition must precede any CALLs to it. This is unavoidable, since the assembler must assume that a CALL to an undefined symbol takes 3 program bytes. All assemblers will issue an error in this situation.
These directives, recognized only by A386, specify whether the following code will execute in a 16-bit segment, or a 32-bit segment. They are useful in COM mode, for assembling snippets of code whose loading and execution will be controlled by a program instead of the operating system. In OBJ mode, the same effect can be achieved by including the above keywords in a SEGMENT directive (Chapter 10).
A86 allows the inclusion of alternate source files within the middle of a "parent" source file, via the INCLUDE directive. When you give the name INCLUDE followed by the name of a file, A86 will insert the contents of the named file into the assembly source stream, as if it were substituted for the INCLUDE line. There is no limit to the size of an INCLUDE file, and INCLUDEs may be nested (the file included may itself contain INCLUDE directives) to any level within reason. Parentheses are optional around the file name; if you don't give them, there must be at least one blank between the INCLUDE and the file name.
If there is no file name whatever following the INCLUDE, A86 will perform an A86LIB library search (see Chapter 13), and INCLUDE all library files necessary to resolve all undefined symbols at the point of the INCLUDE. This provides an "in-file" equivalent to the pound-sign given on the invocation line. (pg 58)
A86 allows you to produce either .COM files, which can be run immediately as standalone programs, or .OBJ files, to be fed to any DOS-based LINK program. In this chapter I'll discuss .OBJ mode of A86.
I'll start by giving you the minimum amount of information you need to know to produce .OBJ files. If you are writing short interface routines, and do not want to concern yourself with the esoterica of .OBJ files (segments, groups, publics, etc.), you can survive quite nicely by reading only this section.
There are two ways you can cause A86 to produce a .OBJ file as its object output. One way is to explicitly give .OBJ as the output file name: for example, you can assemble the source file FOO.8 by giving the command "A86 FOO.8 FOO.OBJ". The other way is to specify the switch +O (letter O not digit 0). This is illustrated by the invocation "A86 +O FOO.8", which will have the same effect as the first invocation.
My design philosophy for .OBJ production is to accommodate two types of user. The first type of user is writing new code, to link to other (usually high level language) modules. That person should be able to write the module with a minimum of red tape, and have A86 do the right thing. The second type of user has existing modules written for Intel/IBM assemblers, and wants to port them to A86. A86 should recognize and act upon all the relocation directives (SEGMENT, GROUP, PUBLIC, EXTRN, NAME, END) given. The assembly should work even if several files, assembled separately under the Intel/IBM assembler, are fed to a single A86 assembly. You'll see if you read on through this entire chapter that the multiple-files requirement causes A86 to interpret some of the relocation directives a little differently (while achieving compatible results).
Let's suppose you're writing new code: for example, an interface routine to the "C" language, that multiplies a 16-bit number by 10. "C" pushes the input number onto the stack, before calling your routine. Your code needs to get the number, multiply it by 10, and return the answer in the AX register. You can code it.
_MUL10: ; "C" expects all public names to start with "_" PUSH BP ; "C" expects BP to be preserved MOV BP,SP ; we use BP to address the stack MOV AX,[BP+4] ; fetch the number N, beyond BP and the ret addr ADD AX,AX ; 2N MOV BX,AX ; 2N is saved in BX ADD AX,AX ; 4N ADD AX,AX ; 8N ADD AX,BX ; 8N + 2N = 10N POP BP ; BP is restored RET ; go back to caller
These 11 lines can be your entire source file! If you name the file MUL10.8, A86 will create an object file MUL10.OBJ, that conforms to the standard SMALL model of computation for high level languages. If you use RETF instead of RET (thus, by the way, getting the operand from BP+6 instead of BP+4), the object module will conform to the standard LARGE model of computation. All the red tape information required by the high level language is provided implicitly by A86. I'll go through this information in detail later, but you should need to read about it only if you're curious.
What happens if you need to access symbols outside the module you're assembling? If the type of the symbol is correctly guessed from the instruction that refers to it, then you can simply refer to it, and leave it undefined within the module. For example, if A86 sees the instruction CALL PRINT with PRINT undefined, it will assume that PRINT is a NEAR procedure. If PRINT is never defined within the module, A86 will act as if you declared PRINT via the directive EXTRN PRINT:NEAR. The address of PRINT will be plugged into your instruction by LINK when it combines A86's .OBJ file with the high level language's .OBJ files, to make the final program. (pg 59)
In general, the undefined operand to any CALL or JMP instruction is assumed to be NEAR. The second (source) operand to a MOV or arithmetic instruction is assumed to be ABS (i.e., an immediate constant). An undefined first (destination) operand is assumed to be a simple memory variable, of the same size (BYTE or WORD) as the register given in the second operand. If your external symbol does not comply with these guidelines, you need to declare it with an EXTRN before you use it. (You can also use EXTRN to declare types of non-complying forward references within your module, as you'll see later.)
If you'd like to link the MUL10 procedure to Turbo Pascal V4.0 or later, you need to prepend the line CODE SEGMENT PUBLIC to the top of the program, to name the program segment according to Turbo Pascal's expectations. You may dispense with the leading underscore in the name MUL10-- Turbo Pascal does not require or expect it.
At this point, if you're a casual user, I think you've read enough to get going! Read further only if you wish; or if you get stuck, and need to master the esoterica.
When you assemble a program directly into a .COM file, the program has just two forms: the source program, that you can understand, and the .COM file, that the computer can "understand" (i.e., execute). A .OBJ file is an intermediate format: neither you nor the (executing) computer can make sense out of a .OBJ file; only programs like LINK interpret .OBJ files. The purpose of a .OBJ file is to allow you to assemble or compile just a part of a program. The other parts (also in the form of .OBJ files) can be produced at a different time; often by a different assembler or compiler, whose source files are in a different language. It's easy to see where the word "linkage" comes from: the LINK program puts the pieces of a program together. The "relocation" comes because the assembler or compiler that makes a given program piece doesn't know how many other pieces will come before it, or how big the other pieces will be. Each piece is constructed as if it started at location 0 within the program; then LINK "relocates" the piece to its true location.
Many of the relocation features of 86 assembly language are couched in terms of LINK's point of view, so we must look at the way LINK sees things. LINK calls a .OBJ file an "object module", or just "module". Each module has a NAME, that can be referred to when LINK issues diagnostic messages, such as error messages and symbol maps. If a program symbol is used only within a single module, it does not need to be given to LINK, except possibly to pass along to a symbolic debugger. On the other hand, if a program symbol is defined in one module and referenced in other modules, then LINK needs to know the name of the symbol, so it can resolve the references. Such a symbol is PUBLIC in the module in which it is defined; it is "external" in the other modules, containing references to it. Finally, exactly one module in a program must contain the starting location for the program; that module is called the "main module", and it must supply the starting address (which is not necessarily at the beginning of the module).
In the 86 family of microprocessors, the LINK system also does much to manage the memory segments that a program will fit into, and get its data from. The (grotesquely ornate) level of support for segmentation was dictated by Intel, when it specified (and IBM and the compiler makers accepted) the format that .OBJ files will have. I attended the fateful meeting at Intel, in which the crucial design decisions were made. I regret to say that I sat quietly, while engineers more senior than I applied their fertile imaginations to construct fanciful scenarios which they felt had to be supported by LINK. Let's now review the resulting segmentation model. (pg 60)
The parts of a program, as viewed by LINK, come in three different sizes: they can be (1) pieces of a single segment, (2) an entire single segment, or (3) a sequence of consecutive segments in 86 memory. Size (1) should have been called something like FRAGMENT, but is instead called SEGMENT. Size (2) should have been called SEGMENT, but is instead called GROUP. Size (3) should have been called "group", but is instead called "class". Let me cling to the sensible terminology for one more paragraph, while I describe the worst scenario Intel wanted to support; then when I discuss individual directives, I'll regretfully revert to the official terminology.
The scenario is as follows: suppose you have a program that occupies about 100K bytes of memory. The program contains a core of 20K bytes of utility routines that every part of the program calls. You'd like every part of the program to be able to call these routines, using the NEAR form to save memory. By gum, you can do it! You simply(!) slice the program into three fragments: the utility routines will go into fragment U, and the rest of the program will be split into equal-sized 40K-byte fragments A and B. Now you arrange the fragments in 8086 memory in the order A,U,B. The fragments A and U form a 60K-byte block, addressed by a segment register value G1, that points to the beginning of A. The fragments U and B form another 60K-byte block addressed by a segment register value G2, that points to the beginning of U. If you set the CS register to G1 when A is executing, and G2 when B is executing, the U fragment is accessible at all times. Since all direct JMPs and CALLs are encoded as relative offsets, the U-code will execute direct jumps correctly whether addressed by G1 with a huge offset, or G2 with a small offset. Of course, if U contains any absolute pointers referring to itself (such as an indirect near JMP or CALL), you're in trouble.
It's been many years since the fateful design meeting took place, and I can report that the above scenario has never taken place in the real world. And I can state with some authority that it never will. The reason is that the only programs that exceed 64K bytes in size are coded in high level language, not assembly language. High-level-language compilers follow a very, very restricted segmentation model-- no existing model comes remotely close to supporting the scheme suggested by the scenario. But the 86 assembly language can support it -- the directives "G1 GROUP A,U" and "G2 GROUP B,U", followed by chunks of code of the appropriate object size, headed by directives "A SEGMENT", "B SEGMENT", and "U SEGMENT". The LINK program is supposed to sort things out according to the scenario; but I can't say (and I have my doubts) if it actually succeeds in doing so.
The concept of "class" was added as an afterthought, to implement the more sensible and usable features that outsiders thought GROUPs were implementing; namely, the ability to specify that different (and disjoint!) segments occur consecutively in memory. This allows programs to be arranged in a consistent manner -- for example, with all program code followed by all static data segments followed by all dynamically allocated memory.
The NAME directive specifies that "module_name" be given to LINK as the name of the module produced by this assembly. The symbol "module_name" can be used elsewhere in your program without conflict: it can even, if you like, be a built-in assembler mnemonic (e.g. "NAME MOV" is acceptable)! If you do not provide a NAME directive, A86 will use the name of the output object file, without the .OBJ extension. If you provide more than one NAME directive, A86 will use the last one given, with no error reported.(pg 61)
The PUBLIC directive allows you to explicitly list the symbols defined in this assembly, that can be used by other modules. If you do not give any PUBLIC directives in your program, A86 will use every relocatable label and variable name in your program, except local labels (the redefinable labels consisting of a letter followed by digits: L7, M1, Q234, etc.). Symbols EQUated to constants, and symbols defined within structures and DATA SEGMENTs, are not implicitly declared PUBLIC: you have to explicitly include them in a PUBLIC directive.
A86 maintains an internal flag, telling it whether to figure out for itself which symbols are PUBLIC, or to let the program explicitly declare them. The flag starts out "implicit", and is set to "explicit" only if A86 sees a PUBLIC directive with no names at all, or a PUBLIC directive containing at least one name that would have been implicitly made PUBLIC.
If you are writing new code, you'll probably want to keep the flag "implicit". You use the PUBLIC directive only for those symbols which have the form of local labels, but aren't (e.g., a memory variable I1987 for 1987 income); and for absolute values that are globally accessed -- e.g., specify "PUBLIC OPEN_FILES_LIMIT" for a symbol defined as "OPEN_FILES_LIMIT EQU 20".
If you are porting existing code, that code will already have PUBLIC directives in it, and A86 will go to "explicit" mode, duplicating the functionality of other assemblers.
The PUBLIC directive with no names is used to force "explicit" mode, thus causing (if there are no further PUBLICs with names) the .OBJ file to declare no symbols PUBLIC.
There is another side effect to the PUBLIC directive: if a symbol is declared PUBLIC in a module, it had better be defined in that module. If it isn't then A86 includes it in the .UND listing of undefined symbols in the module, and suppresses output of the object file.
where "type" is one of: BYTE WORD DWORD QWORD TBYTE FAR
or synonymously: B W D Q T F
or: P NEAR ABS PROC
The EXTRN directive allows you to attach a type to a symbol that may not yet be defined (and may never be defined) within your program. This is often necessary for the assembler to generate the correct instruction form when the symbol is used as an operand. All the possible types except ABS and PROC are defined elsewhere in the A86 language, but I list them again here for convenience.
B or BYTE: byte-sized memory variable W or WORD: word (2 byte) sized memory variable D or DWORD: doubleword (4-byte) sized memory variable Q or QWORD: quadword (8-byte) sized memory variable T or TWORD: 10-byte-sized memory variable F or FAR: 4-byte program label accessed from outside this segment P: 6-byte program label accessed from outside this segment NEAR: program label accessed within a segment ABS: an absolute number (i.e., an immediate constant) PROC: same as NEAR unless you provide a PROC EQU FAR
An example of EXTRN usage is as follows: suppose there is a word memory variable IFARK in your program. The variable might be declared at the end of the program; or it might be defined in a module completely outside of this program. Without an EXTRN directive, A86 will assemble an instruction such as "MOV AX,IFARK" as the loading of an immediate constant IFARK into the AX register. If you place the directive "EXTRN IFARK:W" at the top of your program, you'll get the correct instruction form for MOV AX,IFARK -- moving a word-memory variable into the AX register.
A86 will allow more than one EXTRN directive for a given symbol, as long as the same type is given every time. A86 will even allow an EXTRN directive for a symbol that has already been defined, as long as the type declared is consistent with the symbol's definition. These allowances exist so that you can assemble multiple files written for another assembler, that had been fed separately to that assembler. (pg 62)
Note that EXTRN is viewed quite differently by A86 than by other assemblers. In fact, if it weren't for those other assemblers, I'd use the mnemonic DECLARE instead of EXTRN. A86 doesn't really use EXTRN to determine which symbols are external -- it uses those symbols that are undefined at the end of assembly. As I stated earlier in the chapter, an undefined symbol can be referenced without being declared via EXTRN. Conversely, a defined symbol can be declared (and redeclared) via EXTRN; being defined, such a symbol will not be specified "external" in the .OBJ file.
Because EXTRN is useful in forward reference situations, it is recognized even when A86 is assembling a .COM file.
For those of you who are accustomed to the more traditional use of EXTRN, and who do not like external records to be created "behind your back", A86 offers the "+G16" option. If you include "+G16" in the program invocation, A86 will require that all undefined symbols be explicitly declared via an EXTRN. Any undefined, undeclared symbols will be included in the .UND listing of undefined symbols, and object-file output will be suppressed.
I've already stated that exactly one module in a program is the "main" module, containing the starting address of the entire program. In A86 when assembling .OBJ files, the starting address is given by the label MAIN. You simply provide the label MAIN: where you want the program to start. The module containing MAIN is the main module. Note that if you have the +c case-sensitivity switch enabled, MAIN must be in all-caps.
The END directive is used by other assemblers for two purposes, both of which are now a little silly. The first purpose is to signal the end of assembly. This was necessary back in the days when source files were input on media such as paper tape: you had to tell the assembler explicitly that the content of the tape has ended. Today the operating system can tell you when you've reached the end of the file, so this function is an anachronism.
The second purpose of END is, nonsensically, to allow you to specify the starting location of the program. I suppose the person who wrote the first assembler back in the 1950's was too short on memory to implement a separate START directive, or a MAIN label like A86 has, and decided to let END do double duty. I've always considered the example "END START" to have an Alice-in-Wonderland quality; it is fuel for the high-level-language snobs who like to attack assembly language. Please defeat the snobs, and use MAIN: if you are writing new code.
For compatibility, A86 treats "END start_addr" exactly the same as if you had coded "MAIN EQU start_addr". Note that if you want your program to assemble under both A86 and other assemblers, you can specify "END MAIN". A86 treats MAIN EQU MAIN as a legal redefinition of the symbol MAIN.
A86 ignores END when there is no starting-address operand, thus allowing assembly of multiple files written for other assemblers.
where "align" is one of: BYTE WORD DWORD PARA PAGE
"combine" is one of: PUBLIC STACK COMMON MEMORY AT number
"use" is one of: USE16 USE32 FLAT
The SEGMENT directive says that assembled object code will henceforth go
to a block of code whose name is "seg_name". "seg_name" is a symbol that
represents a value that can be loaded into a segment register. If "seg_name"
is not declared in a GROUP directive, then its value should in fact be loaded
into a segment register, in order to address the code. If "seg_name" is declared
in a GROUP directive, then the code is a part of the segment addressed by
the name of the group.(pg 63)
A program can consist of any number of named segments, to be combined in numerous exotic ways to produce the final program. You can redirect your object output from one segment to another in your assembly, by providing a SEGMENT directive before each piece of code. You can even return to a segment you started earlier, by repeating a SEGMENT with the same name -- the assembler just picks up where it left off, subject to some possible skipping for memory alignment, that I'll describe shortly.
The specifications following the word SEGMENT help to describe how the code in this module's part of the segment will be combined with code for the same segment name given in other modules; and also how this named segment will be grouped with other named segments. Other assemblers require the specifications to be given in the order indicated. A86 will accept any order, and will accept commas between the specifications if you want to provide them. The only restriction is that "AT number" must be followed by a comma if it is not the last specification on the line.
The "align" specification tells if each piece of code within the segment should be aligned so that its starting address is an even multiple of some number. BYTE alignment means there is no requirement; WORD alignment requires each piece to start at a multiple of 2; DWORD alignment, at a multiple of 4; PARA alignment, at a multiple of 16; PAGE alignment, at a multiple of 256. For example, suppose you have a segment containing memory variables. You can declare the segment with the statement "VAR_DATA SEGMENT WORD", which ensures that the segment is aligned to an even memory address. That way you can insure that all 16-bit and bigger memory quantities in the segment are aligned to even addresses, for faster access on the 16-bit machines of the 86 family.
There are special rules governing alignment for multiple pieces of the same named segment within the same program module. Other assemblers outlaw conflicting alignment specifications in this situation; A86 accepts them, and uses the strictest specification given. Furthermore, the alignment given for any specification beyond the first will control the alignment for that piece of code within this module's chunk. For example, if a program contains two pieces of code headed by "VAR_DATA SEGMENT WORD", A86 will insert a byte between the pieces if the first piece has an odd number of bytes. This ensures correct assembly for multiple files written for another assembler.
If no "align" type is given for any of the pieces of a named segment, an alignment of PARA is assumed.
The "combine" specification tells how the chunk of code from this module will be combined with the chunks of the same named segment, that come from other modules. Yes, I know, that sounds like what "align" does; but "combine" takes a different, more major point of view.
STACK SEGMENT WORD STACK DW 100 DUP (?) TOP_OF_STACK.
The code just given declares a stack area of 200 bytes (100 words) for this module. If identical code occurs in each of three modules which are then linked together, the resulting STACK segment will have 600 bytes (the sizes are added), but TOP_OF_STACK will be the same address (600) for each module (each piece is overlayed at the top of the segment). That way, every module can declare and access the top of the stack, which is the only static part of the stack that any code should ever refer to.(pg 64)
The combine type specification can be repeated in subsequent pieces of a given segment, but if it is, it must be the same in all pieces.
Finally, if no combine type is ever given for a named segment in a module, that segment is non-combinable -- no other modules may define that segment; the code given in the one module constitutes the entire segment.
The "use" specification, recognized only by A386, tells whether the segment is a 32-bit segment. The default value USE16 means the segment is a 16-bit segment: 32-bit registers, memory operands, and memory indexing will be encoded using address and operand override bytes. The value USE32 specifies a 32-bit segment (to be loaded and run only in protected-mode environments such as Windows or OS/2). In 32-bit segments, the 16-bit operands and addresses require the override opcode bytes, and the 32-bit operands and addresses don't. Finally, the FLAT specification is identical to USE32, with the additional implicit proviso that the segment will be contained in a group named FLAT. This accommodates the OS/2 FLAT model of segmentation, in which everything resides in a single, huge segment.
The last specification available on a SEGMENT line is the class name, which is identified by being enclosed in single quotes. Unlike a segment name, which can be used as an instruction operand and hence cannot conflict with other assembler symbols, a class name can be assigned without regard to its usage elsewhere in the program. It can even be a built-in A86 mnemonic. In fact, both the SMALL and LARGE high-level-language models specify the class name 'CODE' for code segments, and the SMALL model specifies the class name 'DATA'.
If no class name is given for a segment, A86 specifies the null (zero length) string as the class name.
The DATA SEGMENT and STRUC directives work in .OBJ mode exactly as they do in .COM mode-- they define a special assembly mode, in which declarations are made, but no object code is output. Offsets within DATA segments and structures are absolute, as in .COM mode. Assembly resumes as before when an ENDS or CODE SEGMENT directive is encountered.
For MASM compatibility (especially in modules written to link to Turbo Pascal V4.0 programs), A86 recognizes the reserved symbols CODE, DATA, and STACK as ordinary relocatable segment names. The ordinary functionality takes effect whenever a SEGMENT directive is given with CODE, DATA or STACK as the segment name, and with one or more relocatable parameters (e.g., PUBLIC) given after SEGMENT.
The ENDS directive closes out the segment currently being assembled, and returns assembly to the segment being assembled before the last SEGMENT directive. The "seg_name", if given, must match the name in that last SEGMENT directive. ENDS allows you to "nest" segments inside one another. For example, you can declare some static data variables that are specific to a certain section of code at the top of that section. (pg 65)
_DATA SEGMENT BYTE PUBLIC 'DATA' VAR1 DB ? VAR2 DB ? _DATA ENDS
These four lines can be inserted inside any other segment being assembled. They will cause the two variable allocations to be tacked onto the segment _DATA; and assembly will then continue in whatever segment surrounded the four lines. Observe that the "nesting" does not occur in the final program; only the presentation of the source code is nested.
If you are not nesting segments inside one another, then the ENDS directive serves only to lend a clean, "block-structured" appearance to your source code. It does not assist A86 in any particular way; in fact, it consumes a bit more object output memory (slightly reducing object output capacity) if you have ENDSs, rather than just starting up new segments with SEGMENT directives.
Other assemblers outlaw any code outside of a SEGMENT declaration, forcing you to give a SEGMENT declaration before you can assemble anything. A86 lets you assemble just your code; you don't have to worry about SEGMENTs if you don't want to.
If you do provide code outside of all SEGMENT declarations, A86 performs the following steps, to find a reasonable place to put the code.
The GROUP directive causes A86 to tell LINK that all the listed segments can fit into a single 64K-byte block of memory, and instruct LINK to make that fit. (If they won't fit, LINK will issue an error message.) Having declared the group, you can then use "group_name" as the segment register value that will allow simultaneous access to all the named segments. The order of names given in the list does not necessarily determine the order in which the segments will finally appear within the group.
The most useful application of the GROUP directive is to allow you to structure the pieces of a program, all of whose code and data will fit into a single 64K segment. You organize the pieces into SEGMENTs, and declare all the SEGMENTs to be within the same GROUP. When the program starts, all segment registers are set to point to the GROUP, and you never have to worry about segment registers again in the program.
WARNING: If your segments will be GROUPed in the final program, you should have the appropriate GROUP directive in every module assembled. If you don't, then any memory pointers generated will be relative to the beginning of the individual named segments, not to the beginning of the whole group.
Because of the obscure scenario I described in the Overview section, Intel does not prohibit more than one GROUP from containing some of the same segments; so neither does A86. Any pointers within a segment will be calculated from the beginning of the last GROUP within which the segment was declared. But again, I have my doubts as to whether LINK will handle this correctly.
The SEG operator returns the segment containing its operand -- a value suitable for loading into one of the segment registers. If the operand is an explicit far constant such as 01811:0100, the value returned is the lefthand component of the constant (01811 in this example). Otherwise, the result depends on A86's output mode. (pg 66)
When A86 is assembling to an OBJ file, the result is the named relocatable segment containing the operand. SEG is most useful when the operand is not defined in this A86 module: in that case, the segment value will be plugged in by LINK.
When A86 is assembling to a COM file, SEG always returns the CS register, with one exception: symbols declared within a SEGMENT AT structure return the value of the containing segment. COM files have no facility for explicitly specifying relocatable segments, so for compatibility A86 assumes that all non-absolute segment references are to the program's segment itself. (pg 67)
A86 contains an easy-to-use, but very powerful macro facility. The facility subsumes the capabilities of most assemblers, including operand concatenation, repeat, indefinite repeat (often called IRP), indefinite repeat character (IRPC), passing macro operands by text or by value, comparing macro operands to strings, and detecting blank macro operands. Unlike other assemblers, A86 integrates these functions into the main macro facility; so they can be invoked without clumsy syntax, or strange characters in the macro-call operands.
All macros must be defined before they are used. A macro definition consists of the name of the macro, followed by the word MACRO, followed by the text of the macro, followed by #EM, which marks the end of the macro.
Many assembly languages require a list of dummy operand names to follow the word MACRO. A86 does not: the operands are denoted in the text with the fixed names #1, #2, #3, ... up to a limit of #9, for each operand in order. If there is anything following the word MACRO, it is considered part of the macro text.
Examples.
; CLEAR sets the register operand to zero. CLEAR MACRO SUB #1,#1 #EM CLEAR AX ; generates a SUB AX,AX instruction CLEAR BX ; generates a SUB BX,BX instruction ; MOVM moves the second operand to the first operand. ; Both operands can be memory variables. MOVM MACRO MOV AL,#2 MOV #1,AL #EM VAR1 DB ? VAR2 DB ? MOVM VAR1,VAR2 ; generates MOV AL,VAR2 followed by MOV VAR1,AL
The format of a macro definition is flexible. If the macro text consists of a single instruction, the definition can be given in a single line, as in the CLEAR macro given above. There is no particular advantage to doing this, however: A86 prunes all unnecessary spaces, blank lines, and comments from the macro text before entering the text into the symbol table. I recommend the more spread-out format of the MOVM macro, for program readability.
All special macro operators within a macro definition begin with a hash sign # (a hex 23 byte). The letters following the hash sign can be given in either upper case or lower case. Hash-sign operators are recognized even within quoted strings. If you wish the hash sign to be treated literally, and not as the start of a special macro operator, you must give 2 consecutive hash signs: ##. For example.
FOO MACRO DB '##1' DB '#1' #em FOO abc ; produces DB '#1' followed by DB 'abc'
The format of the macro call line is also flexible. A macro call consists of the name of the macro, followed by the operands to be plugged into the macro. A86 prunes leading and trailing blanks from the operands of a macro call. The operands to a macro call are always separated by commas. Also, as in all A86 source lines, a semi-colon occurring outside of a quoted string is the start of a comment, ignored by A86. If you want to include commas, blanks, or semi-colons in your operands, you must enclose your operand in single quotes.
Some macro assemblers expect the operands to macro calls to follow the same syntax as the operands to instructions. In those assemblers, the operands are parsed, and reduced to numeric values before being plugged into the macro definition text. This is called "passing by value". As its default, A86 does not pass by value, it passes by text. The only parsing of operands done by the macro processor is to determine the start and the finish of the operand text. That text is substituted, without regard for its contents, for the "#n" that appears in the macro definition. The text is interpreted by A86 only after a complete line is expanded and as it is assembled. (pg 68)
If the first non-blank character after the macro name is a comma, then the first operand is null: any occurrences of #1 in the macro text will be deleted, and replaced with nothing. Likewise, any two consecutive commas with no non-blanks between them will result in the corresponding null operand. Also, out-of-range operands are null; for example, #3 is a null operand if only two operands are provided in the call.
Null operands to macros are not in themselves illegal. They will produce errors only if the resulting macro expansion is illegal.
The method of passing by text allows operand text to be plugged anywhere into a macro, even within symbol names. For example.
; KF_ENTRY creates an entry in the KFUNCS table, consisting of a ; pointer to a KF_ action routine. It also declares the ; corresponding CF_ symbol, which is the index within the table ; for that entry. KF_ENTRY MACRO CF_#1 EQU ($-KFUNCS)/2+080 DW KF_#1 #EM KFUNCS: KF_ENTRY UP KF_ENTRY DOWN ; The above code is equivalent to: ; ; KFUNCS: ; DW KF_UP ; DW KF_DOWN ; ; CF_UP EQU 080 ; CF_DOWN EQU 081
As mentioned before, if you want to include blanks, commas, or semicolons in your operands, you enclose the operand in single quotes. In the vast majority of cases in which these special characters need to be part of operands, the user wants them to be quoted in the final, assembled line also. Therefore, the quotes are passed in the operand. To override this, and strip the quotes from the string, you precede the quoted string with a hash sign. Examples.
DBW MACRO DB #1 DW #2 #EM DBW 'E', E_POINTER DBW 'W', W_POINTER ; note that if quotes were not passed, the above lines would have ; to be DBW '''E''', E_POINTER; DBW '''W''', W_POINTER FETCH_CHAR MACRO LODSB #1 CALL PROCESS_CHAR #EM FETCH_CHAR STOSB ; generates STOSB as second instruction FETCH_CHAR #'INC DI' ; generates INC DI as second instruction (pg 69)
A86's macro facility contains two kinds of loops: you can loop once for each operand in a range of operands; or you can loop once for each character within an operand. The first kind of loop, the R-loop, is discussed in this section; the second kind, the C-loop, is discussed later.
An R-loop is a stretch of macro-definition code that is repeated when the macro is expanded. In addition to the fixed operands #1 through #9, you can specify a variable operand, whose number changes each time through the loop. You give the variable operand one of the 4 names #W, #X, #Y, or #Z.
An R-loop begins with #R, followed immediately by the letter W,X,Y, or Z naming the variable, followed by the number of the first operand to be used, followed by the number of the last operand to be used. After the #Rxnn is the text to be repeated. The R-loop ends with #ER. For example.
STORE3 MACRO MOV AX,#1 #RY24 ; "repeat for Y running from 2 through 4" MOV #Y,AX #ER #EM STORE3 VAR1,VAR2,VAR3,VAR4 ; the above call produces the 4 instructions MOV AX,VAR1; MOV VAR2,AX; ; MOV VAR3,AX; MOV VAR4,AX.
A86 recognizes the special operator #L, which is the last operand in a macro call. #L can appear anywhere in macro text; but its big power occurs in conjunction with R-loops, to yield an indefinite-repeat facility.
A common example is as follows: you can take any macro that is designed for one operand, and easily convert it into a macro that accepts any number of operands. You do this by placing the command #RX1L, "repeat for X running from 1 through L", at the start of the macro, and the command #ER at the end just before the #EM. Finally, you replace all instances of #1 in the macro with #X. We see how this works with the CLEAR macro.
CLEAR MACRO #RX1L SUB #X,#X #ER #EM CLEAR AX,BX ; generates both SUB AX,AX and SUB BX,BX in one macro!
It is possible for R-loops to iterate zero times. In this case, the loop-text is skipped completely. For example, CLEAR without any operands would produce no expanded text.
We have seen the R-loop; now we discuss the other kind of loop in macros, the character loop, or C-loop. In the C-loop, the variable W,X,Y, or Z does not represent an entire operand; it represents a character within an operand.
You start a C-loop with #C, followed by one of the 4 letters W,X,Y, or Z, followed by a single operand specifier -- a digit, the letter L, another one of W,X, Y, or Z defined in an outer loop, or one of the more complicated specifiers defined later in this chapter. Following the #Cxn is the text of the C-loop. The C-loop ends with #EC. The macro will loop once for every character in the operand. That single character will be substituted for each instance of the indicated variable operand. For example.
PUSHC MACRO #CW1 PUSH #WX #EC#EM PUSHC ABC ; generates PUSH AX | PUSH BX | PUSH CX
If the C-operand is quoted in the macro call, the quotes ARE removed from the operand before passing characters to the loop. It is not necessary to precede the quoted string with a hash sign in this case. If you do, the hash sign will be passed as the first character. (pg 70)
If the C-operand is a null operand (no characters in it), the loop text is skipped completely.
So far, we have seen that you can specify operands in your macro in fourteen different ways: 1,2,3,4,5,6,7,8,9,W,X,Y,Z,L. We now multiply these 14 possibilities, by introducing the "A" and "B" operators. You can precede any of the 14 specifiers with "A" or "B", to get the adjacent operand after or before the specified operand. For example, BL means the operand just before the last operand; in other words, the second-to-the-last operand. AZ means the operand just after the Z operand. You can even repeat, up to a limit of 4 "B"s or 3 "A"s: for example, BBL is the third-to-last operand.
Note that any operand specifier can appear in contexts other than by itself following a # within a macro. For example, BBL could appear as the upper limit to an R-loop: #RZ1BBL loops with Z running from the first operand to the third-to-last operand.
In the case of the variable operand to a C-loop, the "A" and "B" specifiers denote the characters before or after the current looping-character. An example of this is given in the next section.
We have seen that you end an R-loop with a #ER, and you end a C-loop with a #EC. We now present another way to end these loops; a way that lets you specify a larger increment to the macro's loop counter. You can end your loops with one of the 4 additional commands #E1, #E2, #E3, or #E4.
For R-loops terminated by #ER, the variable operand advances to the next operand when the loop is made. If you end your R-loop with #E2, the variable operand advances 2 operands, not just one. For #E3, it advances 3 operands; for #E4, 4 operands. The #E1 command is the same as #ER.
The most common usage of this feature is as follows: You will recall that we generalized the CLEAR macro with the #L-variable, so that it would take an indefinite number of operands. Suppose we want to do the same thing with the DBW macro. We would like DBW to take any number of operands, and alternate DBs and DWs indefinitely on the operands. This is made possible by creating an R-loop terminated by #E2.
DBW MACRO #RX1L DB #X DW #AX #E2 #EM DBW 'E',E_POINTER, 'W',W_POINTER ; two pairs on same line!
The #E2 terminator means that we are looping on a pair of operands. Note the crucial usage of the "A"-after operator to specify the second operand of the operand pair.
A special note applies to the DBW macro above: A86 just happens to accept a DW directive with no operands (it generates no object code, and issues no error). This means that DBW will accept an odd number of operands with no error, and do the expected thing (it alternates bytes and words, ending with a byte).
You could likewise generalize a macro with 3 or 4 operands, to an indefinite number of triples or quadruples; by ending the R-loop with #E3 or #E4. The operands in each group would be specified by #X, #AX, #AAX, and, for #E4, #AAAX.
For C-loops terminated by #E1 through #E4, the character pointer is advanced the specified number of characters. You use this in much the same way as for R-loops, to create loops on pairs, triplets, and quadruplets of characters. For example.
PUSHC2 MACRO #CZ1 PUSH #Z#AZ #E2 #EM PUSHC2 AXBXSIDI ; generates PUSH AX | PUSH BX | PUSH SI | PUSH DI
We now introduce another form of R-loop, called the Q-loop -- the negative repeat loop. This loop is the same as the R-loop, except that the operand number decrements instead of increments; and the loop exits when the number goes below the finish-number, not above it. The Q-loop is specified by #Qxnn instead of #Rxnn, and #EQ instead of #ER. You can also use the multiple-decrement forms #E1 #E2 #E3 or #E4 to terminate an Q-loop. (pg 71)
Example.
MOVN MACRO #QXL2 ; "negative repeat X from L down to 2" MOV #BX,#X #EQ#EM MOVN AX,BX,CX,DX ; generates the three instructions: ; MOV CX,DX ; MOV BX,CX ; MOV AX,BX
Note: the above functionality is already built into the MOV instruction of A86. The macro shows how you would implement it if you did not already have this facility.
A86 allows nesting of loops within each other. Since we provide the 4 identifiers W,X,Y,Z for the loop operands, you can nest to a level of 4 without restriction -- just use a different letter for each nesting level. You can nest even deeper, for example, by having two nested R-loops that use W is its indexing letter. The only restriction to this is that you cannot refer to the W of the outer loop from within the inner W loop. (I challenge anyone to come up with an application in which these limitations/restrictions cause a genuine inconvenience!)
If you have a loop or loops ending when the macro ends, and if the iteration count for those loops is 1, you may omit the #ER, #EC, or #EQ. A86 closes all open loops when it sees #EM, with no error.
For example, if you omit the #ER for the loop version of the CLEAR macro, it would make no difference -- A86 automatically places an #ER code into the macro definition for you.
As already stated, A86's default mode for passing operands is by text -- the characters of the operand are copied to the macro expansion line as-is, without any evaluation. You may override this with the #V operator. When A86 sees #Vn in a macro definition, it will evaluate the expression given in the text of operand n, and pass a string representing the decimal constant answer, instead of the original text. The operand must evaluate to an absolute constant value, less than 65536. For example.
JLV MACRO J#1 LABEL#V2 #EM JINDEX = 3 JLV NC,JINDEX+1 ; generates JNC LABEL4 JINDEX = 6 JLV Z,JINDEX+2 ; generates JZ LABEL8
The construct #Sn is translated by A86 into the decimal string representing the number of characters in operand n. One use of this would be to make a conditional-assembly test of whether an operand was passed at all, as we'll see later in this chapter. Another use is to generate a length byte preceding a string, as required by some high-level languages such as Turbo Pascal. Example.
LSTRING MACRO DB #S1,'#1' #EM LSTRING SAMPLE ; generates DB 6,'SAMPLE'
The construct #Nn is translated by A86 into the decimal string represented by the position number n of the macro operand. Note that this value does not depend on the contents of the operand that was passed to the macro. Thus, for example, #N2 would translate simply to 2; so this usage of #N is silly. #N achieves usefulness when n is variable: W,X,Y,Z, or L. I give an example of #N with a loop-control variable in the next section. Here is an example of #NL, used to generate an array of strings, preceded by a byte telling how many strings are in the array. (pg 72)
ZSTRINGS MACRO DB #NL ; generates the number of operands passed #RX1L DB '#X',0 #EM ZSTRINGS TOM,DICK,HARRY ; generates DB 3 followed by strings
We've seen that macro operands are usually specified in your macro definition by a single character: either a single digit or one of the special letters W,X,Y,Z, or L. A86 also allows you to specify a constant operand number up to 127. You do so by giving an expression enclosed in parentheses, rather than a single character. The expression must evaluate at the time the macro is defined, to a constant between 0 and 127. You can use this feature to translate many programs that use MASM's REPT directive. For example, if the following REPT construct occurs within a MASM macro.
TEMP = 0 REPT 100 TEMP = TEMP + 1 ; MASM needs an explicitly-set-up counter DB TEMP ENDM
you may translate it into an A86 loop, as follows.
#RX1(100) ; the counter X is built into the A86 loop DB #NX #ER If the REPT does not occur within a macro, you must define a macro containing the loop, which you may then immediately call.
Note that the expression enclosed in parentheses must not itself contain any macro operators. Thus, for example, you cannot specify #(#NY+1) to represent the operand after Y -- you must use #AY.
For MASM compatibility, A86 offers the #EX operator, which is equivalent to MASM's EXITM directive. #EX is typically used in a conditional-assembly block within a loop, to terminate the loop early. When the #EX code is seen in a macro expansion, the expansion ceases at that point, and assembly returns to the source file (or to the outer macro in a nested call). You couldn't use #EM to do this, because that would signal the end of the macro definition, not just the call.
Some assemblers have a LOCAL pseudo-op that is used in conjunction with macros. Symbols declared LOCAL to a macro have unique (and bizarre) symbol names substituted for them each time the macro is called. This solves the problem of duplicate label definitions when a macro is called more than once.
In A86, the problem is solved more elegantly, by having a class of generic local labels throughout assembly, not just in macros. Recall that symbols consisting of a single letter, followed by one or more decimal digits, can be redefined. You can use such labels in your macro definitions.
I have recommended that local labels outside of macros be designated L1 through L9. Within macro definitions, I suggest that you use labels M1 through M9. If you used an Ln-label within a macro, you would have to make sure that you never call the macro within the range of definition of another Ln-label with the same name. By using Mn-labels, you reduce such potential conflicts, although you still need to make sure that if you have a macro that calls another macro, and that both macros use local labels, you are naming the labels differently in the two macros.
The following example of a local label within a macro is taken from the source of the macro processor itself.
; "JHASH label" checks to see if AL is a hash sign. If it is, ; it processes the hash sign term, and jumps to label. (pg 73) ; Otherwise, it drops through to the following code. JHASH MACRO CMP AL,'##' ; is the scanned character a hash sign? JNE >M1 ; skip if not CALL MDEF_HASH ; process the hash sign JMP #1 ; jump to the label provided M1: #EM ... L3: ; loop here to eat empty lines, leading blanks CALL SKIP_BLANKS ; skip over the leading blanks of a line INC SI ; advance source ptr beyond the next non-blank JHASH L3 ; if hash sign then process, and eat more blanks CMP AL,0A ; were the blanks terminated by a linefeed? JE L3 ; loop if yes, nothing on this line L5: ; loop here after a line is seen to have contents CMP AL,';' ; have we reached the start of a comment? JE L1 ; jump if yes, to consume the comment JHASH >L6 ; if hash sign then process it; get next char ... L6: LODSB ; fetch the next definition char from the source CMP AL,' ' ; is it blank? JA L5 ; loop if not, to process it ...
There are two ways that A86 lets you debug your macro expansions. First, if A86 encounters an error within a macro expansion, it now includes the offending expansion line within the error message. This will often allow you to spot the problem. If you need a complete listing of the expanded macro, the A86 listing will now give you that. These facilities replace the old EXMAC tool, which sometimes failed to expand excessively complicated macros the way the assembler did.
A86 has a conditional-assembly feature which allows you to specify that blocks of source code will or will not be assembled, according to the values of equated user symbols. The controlling symbols can be declared in the program (and can thus be the result of assembly-time expressions), or they can be declared in the assembler invocation.
You should keep in mind the difference between conditional assembly, invoked by #IF, and the structured-programming feature, invoked by IF without the hash sign. #IF tests a condition at assembly time, and can cause code to not be assembled and thus not appear in the program. IF causes code to be assembled that tests a condition at run time, possibly jumping over code. The skipped code will always appear in the program.
All conditional-assembly lines are identified by a hash sign # as the first non-blank character of a line. The hash sign is followed by one of the four reserved symbols IF, ELSEIF, ELSE or ENDIF.
#IF starts a conditional-assembly block. On the same line, following the #IF, you provide either a single name, or an arbitrary expression evaluating to an absolute constant. In this context, a single name evaluates to TRUE if it is defined and not equal to the absolute constant zero. A name is FALSE if it is undefined, or if it has been equated to zero. An expression is TRUE if nonzero, FALSE if zero.
If the #IF expression evaluates to FALSE, then the following lines of code are skipped, up to the next matching #ELSEIF, #ELSE, or #ENDIF. If the expression is TRUE, then the following lines of code are assembled normally. If a subsequent matching #ELSEIF or #ELSE is encountered, then code is skipped up to the matching #ENDIF.(pg 74)
#ELSEIF provides a multiple-choice facility for #IF-blocks. You can give any number of #ELSEIFs between an #IF and its matching #ENDIF. Each #ELSEIF has a name or expression following it on the same line. If the construct following the #IF is FALSE, then the assembler looks for the first TRUE construct following an #ELSEIF, and assembles that block of code. If there are no TRUE #ELSEIFs, then the #ELSE-block (if there is one) is assembled.
You should use the ! instead of the NOT operator in conditional-assembly expressions. The ! operator performs the correct translation of names into TRUE or FALSE values, and handles the case !undefined without reporting an error.
#ELSE marks the beginning of code to be assembled if all the previous blocks of an #IF have been skipped over. There is no operand after the #ELSE. There can be at most one #ELSE in an #IF-block, and it must appear after any #ELSEIFs.
#ENDIF marks the end of an #IF-block. There is no operand after #ENDIF.
It is legal to have nested #IF-blocks; that is, #IF-blocks that are contained within other #IF-blocks. #ELSEIF, #ELSE, and #ENDIF always refer to the innermost nested #IF-block.
As an example of conditional assembly, suppose that you have a program that comes in three versions: one for Texas, one for Oklahoma, and one for the rest of the nation. The three programs differ in a limited number of places. Instead of keeping three different versions of the source code, you can keep one version, and use conditional assembly on the boolean variables TEXAS and OKLAHOMA to control the assembler output. A sample block would be.
#if TEXAS DB 0,1,2,3 #elseif OKLAHOMA DB 4,5,6,7 #else DB 8,9,10,11 #endif
If a block of code is to be assembled only if TEXAS is false, then you would use the exclamation-point operator.
#if !TEXAS DB 0FF #endif
You may have conditional-assembly blocks either in macro definitions or in macro expansions. The only limitation is that if you have an #IF-block in a macro expansion, the entire block (i.e., the matching #ENDIF) must appear in the same macro expansion. You cannot, for example, define a macro that is a synonym for #IF.
To have your conditional-assembly block apply to the macro definition, you provide the block normally within the definition. For example.
X1 EQU 0 BAZ MACRO #if X1 DB 010 #else DB 011 #endif #EM BAZ X1 EQU 1 BAZ
In the above sequence of code, the conditional-assembly block is acted upon when the macro BAZ is defined. The macro therefore consists of the single line DB 011, with all the conditional-assembly lines removed from the definition. Thus, both expansions of BAZ produce the object-code byte of 011, even though the local label X1 has turned non-zero for the second invocation. (pg 75)
To have your conditional-assembly block appear in the macro expansion, you must literalize the hash sign on each conditional-assembly line by giving two hash signs.
X1 EQU 0 BAZ MACRO ##if X1 DB 010 ##else DB 011 ##endif #EM BAZ X1 EQU 1 BAZ
Now the entire conditional-assembly block is stored in the macro definition, and acted upon each time the macro is expanded. Thus, the two invocations of BAZ will produce the different object bytes 011 and 010, since X1 has become non-zero for the second expansion.
Unless the conditional-assembly expression involves a macro operand, you will usually want your conditional-assembly blocks to be acted upon at macro definition time, to save symbol table space. You will thus use the first form, with the single hash signs.
Microsoft's MASM assembler has an abundance of confusing conditional-assembly directives, all of which are subsumed by A86's #IF expression evaluation policies. IFDEF is covered by A86's #IF directive in conjunction with its DEF operator. IFE and IFNDEF are duplicated by #IF followed by the exclamation-point (boolean negation) operator, followed by a DEF operator. IFB and IFNB test whether a macro operand has been passed as blank -- they can be simulated by testing the size of the operand with the #Sn operator. Finally, IFIDN and IFDIF do string comparisons of macro operands. This is more generally subsumed by the string-comparison capabilities of the operators EQ, NE, and =.
Examples of translation of each of these constructs is given in the next chapter, on compatibility with other assemblers.
To facilitate the effective use of conditional assembly, A86 allows you to declare boolean (true-false) symbols in the command line that invokes the assembler. The declarations can appear anywhere in the list of source file names. They are distinguished from the file names by a leading equals sign =. To declare a symbol TRUE (value = 1), give the name after the equals sign. DO NOT put any spaces between the equals sign and the name! To declare a symbol FALSE (value = 0), you can give an equals sign, an exclamation point, then the name. Again, DO NOT embed any blanks! Example: if your source files are src1.8, src2.8, and src3.8, then you can assemble with TEXAS true by invoking A86 as follows.
a86 =TEXAS src1.8 src2.8 src3.8
You can assemble with TEXAS explicitly set to FALSE as follows.
a86 =!TEXAS src1.8 src2.8 src3.8
Note that if TEXAS is used only as a conditional-assembly control, then you do not need to include the =!TEXAS in the invocation, because an undefined TEXAS will automatically be interpreted as false.
A user pointed out to me that it's impossible to get an equals-sign into an environment variable. So A86 now accepts an up-arrow (hex 5E) character in place of an equals-sign for an invocation variable.
A86 will ignore an equals-sign by itself in the invocation line, without error. This allows you to generate assembler-invocation lines using parameters that could be either boolean variable names, or null strings. For example, in the previously-mentioned TEXAS-OKLAHOMA-nation example, the program could be invoked via a .BAT file called "AMAKE.BAT", coded as follows.
A86 =%1 *.8
You invoke A86 by typing one of the following.
amake texas amake oklahoma (pg 76) amake
The third line will produce the assembler invocation A86 = *.8; causing no invocation variables to be declared. Thus both TEXAS and OKLAHOMA will be false, which is exactly what you want for the rest-of-the-nation version of the program.
The usual prohibition against changing the value of a symbol that is not a local label does not apply to invocation variables. For example, suppose you have a conditional-control variable DEBUG, which will generate diagnostic code for debugging when it is true. Suppose further that you have already debugged source files src1.8 and src3.8; but you are still working on src2.8. You may invoke A86 as follows.
A86 src1.8 =DEBUG src2.8 =!DEBUG src3.8
The variable DEBUG will be TRUE only during assembly of src2.8, just as you want.(pg 77)
I gave heavy priority to compatibility when I designed A86; a priority just a shade behind the higher priorities of reliability, speed, convenience, and power. For those of you who feel that "close, but incompatible" is like saying "a little bit pregnant", I'm sorry to report that A86 will not assemble all Intel/IBM/MASM programs, unmodified. But I do think that a majority of programs can, with a little massaging, be made to assemble under A86. Furthermore, the massaging can be done in such a way as to make the programs still acceptable to that old, behemoth assembler.
I have been adding compatibility features with almost every new version of A86. Among the features added since A86 was first released are: more general forward references, double quotes for strings, "=" as a synonym for EQU, the RADIX directive, the COMMENT directive, and the COMPAT.8 file containing macros for a number of segmentation-model directives. If you tried feeding an old source file to a previous A86 and were dismayed by the number of error messages you got, try again: things might be more manageable now.
Following is a list of the things you should watch out for when converting from MASM to A86.
MOV AL,CS:TABLE[SI] ; if you want compatibility do this CS MOV AL,TABLE[SI] ; if not you can do it this way
MOVM MACRO DEST,SRC MOV AL,DEST MOV SRC,AL ENDM
would be translated by eliminating the DEST,SRC declarations on the first
line, replacing DEST with #1 and SRC with #2 in the body of the definiation,
and replacing ENDM by #EM -- the result is the MOVM macro that I presented
at the beginning of Chapter 11.
Other constructs have straightforward translations, as illustrated by the
following examples. Note that examples involving macro parameters have double
hash signs, since the condition will be tested when the macro is expanded,
not when it is defined.
MASM construct Equivalent A86 construct IFE expr #IF ! expr IFB <PARM3> ##IF !#S3 IFNB <PARM4> ##IF #S4 IFIDN <PARM1>,<CX> ##IF "#1" EQ "CX" IFDIF <PARM2>,<SI> ##IF "#2" NE "SI" IFDEF symbol #IF DEF symbol IFNDEF symbol #IF ! DEF symbol .ERR (any undefined symbol) .ERRcond TRUE EQU -1; followed by TRUE EQU cond EXITM #EX IRP ... ENDM #RX1L ... #ER REPT 100 ...ENDM #RX1(100) ... #ER IRPC ... ENDM #CX ... #EC
The last three constructs, IRP, REPT, and IRPC, usually occur within macros;
but in MASM they don't have to. The A86 equivalents are valid only within
macros -- if they occur in the MASM program outside of a macro, you duplicate
them by defining an enclosing macro on the spot, and calling that macro once,
right after it is defined.
PAYREC STRUC PNAME DB 'no name given' PKEY DW ? ENDS PAYREC 3 DUP (?) PAYREC <'Eric',1811>
causes A86 to accept the STRUC definition, and to define the structure elements PNAME and PKEY correctly; but the PAYREC initializations need to be recoded. If it isn't vital to initialize the memory with the specific definition values, you could recode the first PAYREC as.
DB ((TYPE PAYREC) * 3) DUP ?
If you must initialize values, you do so line by line.
DB 'Eric ' DW ?
If there are many such initializations, you could define a macro INIT_PAYREC containing the DB and DW lines.
A86 has been programmed to ignore a variety of lines that have meaning to Intel/IBM/MASM assemblers; but which do nothing for A86. These include lines beginning with a percent sign, lines beginning with ASSUME, and lines beginning with any unrecognized symbol that begins with a period. If you are porting your program to A86, and you wish to retain the option of returning to the other assembler, you may leave those lines in your program. If you decide to stay with A86, you can remove those lines at your leisure.
In addition, there is a class of symbols recognized by A86 in its .OBJ mode, but ignored in .COM mode. This includes NAME, END, and PUBLIC.
Named SEGMENT and ENDS directives written for other assemblers are, of course, recognized by A86's .OBJ mode. In non-OBJ mode, A86 treats these as CODE SEGMENT directives. A special exception to this is the directive
segname SEGMENT AT atvalue
which is treated by A86 as if it were the following sequence.
segname EQU atvalue STRUC
This will accomplish what is usually intended when SEGMENT AT is used in a program intended to be a COM file.
I consider this section a bit of a blasphemy, since it's a little silly to port programs from a superior assembler, to run on an inferior one. However, I myself have been motivated to do so upon occasion, when programming for a client not familiar with A86; or whose computer doesn't run A86, and who therefore wants the final version to assemble on Intel's assembler. Since my assembler/debugger environment is so vastly superior to any other environment, I develop the program using my assembler, and port it to the client's environment at the end. (pg 80)
The main key to success in following the above scenarios is to exercise supreme will power, and not use any of the wonderful language features that exist on A86, but not on MASM. This is often not easy; and I have devised some methods for porting my features to other assemblers.
PUSH2 EQU PUSH PUSH3 EQU PUSH POP2 EQU POP POP3 EQU POP
I define macros PUSH2, PUSH3, POP2, POP3 for the lesser assembler, that PUSH or POP the appropriate number of operands. Then, everywhere in the program where I would ordinarily use A86's multiple PUSH/POP feature, I use one or more of the PUSHn/POPn mnemonics instead.
A86 has a powerful listing facility, that allows you to tailor the format of your listings to your specific needs. Because the listing pass adds a significant percentage to the time it takes A86 to execute, the listing is not produced by default. You must include either a +L switch, or the name of a file with a .LST extension on the A86 invocation line.
By default (+L but nothing else specified), an A86 listing file consists of a sequence of pages, each 59 lines long and 79 characters wide. Each page has a header line identifying A86 and its version number, giving the name of the program output file, the date and time of assembly, the name of the source file currently being listed, and a page number. Note that I am not so obnoxious as to splash my company name over the top of every page of your listing! If both a TITLE and a SUBTTL have been specified, the header consists of three content lines and one skipped line; otherwise, there are just two content lines. Each listing line has a sequential line number, a hex offset and hex object bytes, an indicator field with "i" for include files and "m" for macro expansions, and the source code itself. Nested includes have no special indication; nested macros are indicated by increasing indentation of the macro expansion line. A86 tries to be intelligent about the formatting of its listings: it will break up the wraparound of a long line at a word if reasonable. It will avoid breaking up a multi-line listing of less than 10 lines. It will break pages at sensible locations (described in detail shortly, under the PAGE directive). It will suppress blank lines at the top and bottom of pages (but it counts them in the sequential line numbering so you can tell they were there).
Five A86 switches, H, I, L, T, and W, allow you to control the existence and characteristics of titling, pagination, page-number format, page break control, source line numbering, hex object display, and source line display. The operation of these switches is described in detail in Chapter 3. Here are some examples of switch settings that will produce listings meeting some specialized needs.
+L21+T0+W12+I137 produces a listing consisting only of the source code, with the hex offset of each line placed to the left, and with the line truncated at 79 columns. Such a listing file would be ideal for viewing the source file while debugging on a primitive remote system that cannot run D86.
+L9+T0+W4+I128 produces a list file of just source code, with all conditional-assembly lines and skipped code removed. All titling, pagination, line numbers, and hex codes are eliminated, so the list file could be renamed as a source file, and reassembled. This might be useful for archival purposes, or for giving individualized versions of a source file to parties who don't need any of the conditional-assembly options you've programmed.
+L+I186+H15+W12 produces a list file that concentrates on the hex output, increasing the width to 16 bytes per line, showing up to 15 hex runover lines, and limiting the amount of source code shown.
In addition to the five switches just mentioned, A86 has a number of source-code directives that control aspects of listings.
The .NOLIST directive causes all subsequent listing to be suppressed, until a .LIST directive is seen. Line numbering continues during list suppression, so you will see the effects of the .NOLIST directive in the form of a jump in the line numbering of the listing.
I also offer a macro-definition control code, #H, which causes the suppression of the listing of macro expansion lines. If #H appears anywhere within a macro definition, all calls to that macro will be listed as the macro call line only, showing the generated hex object bytes on that call line. This allows you to define macros that will be listed as if they were simple machine instructions. This effect can be achieved for all macros with an L switch setting that doesn't include the value 4 (see Chapter 3). (pg 82)
The TITLE directive specifies a title that will appear at the top of every page of the entire assembly. The title consists of the first 60 characters starting with the first nonblank after the word TITLE on the line. If you give more than one TITLE directive in a program, only the first will be recognized.
The SUBTTL directive specifies a subtitle to appear at the top of every page until another SUBTTL directive is given (or until the next file change if you have the +T16 switch-bit value set). If the directive is at the very top of the listing page, or it is shortly after an automatic page break, the subtitle will take effect on the page in which it appears. Otherwise, it will take effect at the next page.
The PAGE directive serves several purposes. The word PAGE by itself will force a new page in the listing, at that point. A plus sign following the word PAGE causes a new page plus an incremented section number -- e.g. PAGE + on page 1-17 will cause a new page 2-1 to begin. The word PAGE followed by one or more constant parameters will set various A86 listing variables to the specified parameter values. The variables are as follows.
1. The length, in lines, of a listing page. Minimum is 10; maximum is 65535.
2. The width, in characters, of the maximum listing line.
3,4,5,6. The number of lines at the end of a page, less than which A86 guarantees will not be "widowed" after a page break of level 1,2,3,4, respectively.
Omitted parameters (either left off the end or via leading commas or 2 consecutive commas) will remain unchanged.
The concept of "page break levels" is unique to A86 listings: it is my attempt to get A86 to make intelligent decisions about where to issue new listing pages. There are 4 page break levels, normally triggered by gaps (consecutive blank lines) in the source code, and by source-file changes. One- and two-line gaps cause breaks of level 1 and 2, respectively. Three-or-more-line gaps cause a break of level 3. A source-file change causes a break of level 4. If a page break occurs close to the end of a page, and a break of greater level hasn't already been marked, A86 will mark the point for a potential new page. If a page break of equal or greater level doesn't occur before the page is full, A86 will issue a new page at the marked point. The definition of "close to the end of the page" is 10,20,30, and 40 lines, respectively, for break levels 1,2,3,4. Those line counts can be changed by parmeters 3,4,5,6 of the PAGE directive, as already described.
If you are intimidated by all this, or if you want to control page break levels manually, you may specify a T switch value that does not include the "auto-paging" option value 4. With that option disabled, page break levels will occur only at places where you issue a PAGE directive containing a special parameter value /1, /2, /3, or /4. The leading slash indicates that a page break of the indicated level is desired here. Such a parameter will typically be given by itself following PAGE; but, if you wish, it can be interspersed anywhere among other parameter values -- it will not be "counted" for the purposes of determining the other parameters' positions.
When you specify the +X switch, A86 will create a cross-referenced symbol table listing of your program.
The output file, having a standard extension of .XRF is an alphabetical listing of all the non-local symbols in your program. For each symbol, A86 gives its type, the file in which it was defined, its value, and a list of all procedures in which the file was used. If you print this file, you typically use the TCOLS tool to obtain a multi-column listing from A86's single-column output.
Note the use of procedure names to identify references -- this makes the cross-reference listing truly readable. Other cross-reference listings often give either line numbers, which are meaningless unless you go find the associated line; or a file name, which doesn't give you as much useful information. (pg 83)
Here is a more detailed description of the various pieces of information provided for each symbol.
m | for a simple memory variable |
+ | for an index memory quantity |
c | for a constant |
i | for an interrupt equate |
s | for a structure |
If there is a second letter, it is a size attribute: b for byte, w for word, f for far (or doubleword).
Observe that you must use the local-label facility of A86 to make this work. If you don't use local labels as your "place-marker" symbols, the symbol the cross-reference gives you will often be the name of the last "place-marker" symbol, not the name of the last procedure.
To save space, duplicate reference entries are denoted by a single entry, followed by "*n", where n is the decimal number of occurrences of that entry.
There is a tool, A86LIB.COM, available only if you are registered, that lets you build libraries of source files. To use A86LIB, you must first code and debug the A86 source files that you wish to include in your library. Then you issue the command A86LIB followed by the names of the source files. Wildcards are accepted; so you will typically want to gather the source files into a single directory, and use the wildcard specification. For example, if you use the filename extension .8 for your source files, you can issue the command A86LIB *.8 to create the library.
The library created consists of a catalog file, always named A86.LIB, together with the source files that you fed to A86LIB to create the catalog.
The following observations about A86LIB are in order.
You may update A86.LIB by running A86LIB again; either with new files or previously-recorded ones. You will need to update A86.LIB only if you add, delete, or change the names of your library procedures: you do not need to update A86.LIB if you merely modify the code within an existing procedure. If A86LIB is given a file it had already read in a previous run, then A86LIB marks all the symbols it had logged for the file as deleted, before rereading the file. Those symbols that are still in the file are then "unmarked". Thus, symbols that have been deleted from the file disappear functionally from A86.LIB, but still occupy space within A86.LIB. What I'm getting at is this: A86LIB will tolerate alterations in library files quite nicely; but for optimum storage efficiency you should delete A86.LIB and rebuild it from scratch any time you delete anything from the library. A86LIB is so fast that this is never very painful.
Once you have created a library with A86LIB, you access it simply by calling the procedures in it from your A86 program. When A86 finishes an assembly and sees that there are undefined symbols in your program, it will automatically look for copies of A86.LIB in the current directory (then in other directories, as described in the next section). If any of the undefined symbols are found in the A86.LIB catalog, the files containing them are assembled. You see this in the list of files output to the console by A86.
The subroutines in your library or libraries are effectively a permanent part of the A86 language. They can be called up effortlessly in your A86 programs. In time you can build up an impressive arsenal of library modules, making A86 as easy to program in as most high-level languages.
You may now have macros in your A86LIB library. Here's how it works: when A86 sees a new symbol at the beginning of a line, in a context where it would formerly have issued an error, A86 will first look in the A86LIB libraries for the symbol. If it's found, A86 will INCLUDE that library file on the spot, and then assemble the line. NOTE that if the macro is being called within a sequence of executable instructions, the library file must generate no output object code.
You can set an environment variable A86LIB to specify which drives or subdirectories contain A86.LIB files. The variable consists of a sequence of path names separated by semicolons, just like the PATH variable used by the operating system. For example, if you include in your AUTOEXEC.BAT file the line
SET A86LIB=C:\bin\lib;\tools\a86lib
then A86 will look for A86.LIB in the current directory, then it will look for C:\bin\lib\A86.LIB, then \tools\a86lib\A86.LIB. A86 will keep looking in all three catalog files, assembling the appropriate source files from any or all of them, until there are no more undefined symbols, or there are no more source files to assemble.
For every symbol in an A86.LIB catalog, there is recorded the name of the library file containing the symbol. The library file is assumed to be in the same directory as its A86.LIB file, unless a complete path name (starting with \ or a drive specifier) was fed to A86.LIB when A86.LIB was created.
You may force A86 to assemble library files before moving on to more of your program's source files. You do this by placing a hash sign # (hex code 23) between file names in your invocation line. For example, suppose your program has two modules FIRST.8 and LAST.8. FIRST.8 calls subroutines from your library; but you need the library files assembled before LAST.8 is assembled. (You might want this because LAST.8 allocates memory space beyond the end of your program, which would be the end of LAST.8 if it were truly the last module.) You accomplish this by the invocation line. (pg 85)
A86 FIRST.8 # LAST.8
Note that there is never any need to force a library search at the end of your program modules: A86 always makes a library search there, if you have any undefined symbols.
You may now also force a library search from within a source file, by placing a line with INCLUDE by itself with no file names, into the source code. A86 will include any library files necessary to resolve any forward-references at the point of the INCLUDE. (pg 86)
A86 signals successful assembly by returning an ERRORLEVEL of 0. If errors are detected during assembly, A86 returns an ERRORLEVEL of 1. If undefined symbols remain at the end of assembly, A86 returns an ERRORLEVEL of 2.
You usually correct this error by rearranging your code, or (better) by breaking intervening code off into subroutines. If you are using A386, you can specify a long conditional jump via Jcond LONG label. If desperate (and not using A386), you can replace "Jcond" with "IF cond JMP".
This is also reported when you have two operands that are mismatched in size, and the mismatch is something other than Byte vs. Word. Example: MOV AL,D[0100].
If you mistakenly provide a macro-loop variable (#W, #X, #Y, or #Z) outside of any loop defining that variable, this error is detected when the macro is expanded, even though the error is in the macro definition.
The error is also reported if # occurs at the beginning of a line, and is not followed by IF, ELSEIF, ELSE, or ENDIF; or if a conditional assembly parameter is a built-in mnemonic e.g. #IF MOV . See Chapter 11 for the correct usage of the hash sign in both macros and conditional assembly.
If you don't wish to clobber the contents of any registers, and the operands are word-sized, you may PUSH the source operand and then POP to the destination operand: PUSH VAR2 followed by POP VAR1.
This error is also reported when you provide the name of a structure, or the name of an INT equate, in a place where a register or memory operand is expected.
L1: ; first incarnation of L1 JNZ >L1 ; reference to second incarnation JMP L1 ; ERROR -- which incarnation are we referring to? L1: ; second incarnation of L1
If you intended the JMP to be to the second L1, you should prepend a > to the L1, just like the JNZ. If you intended the JMP to be to the first L1, you must change one of the two label names so that their ranges don't overlap.
It's conceivable that this error could result in a D86 session, when you are using patch-memory mode to type in an extremely complicated program. In that case, you should type the program into a text file instead, and use A86 to assemble the text file.
Virtually all releases of A86 include bug fixes. If I don't say anything about a release, then it was essentially only bug fixes.
V2.10 | June 1986. | Initial public release of the MSDOS version of A86. The last previous version ran under the Xenix operating system on the Altos series of computers. For this "public offering", I cleaned up the invocation syntax, upgraded the error-reporting facility, and started adding compatbility features. |
V2.11 | June 1986. | Added RADIX command. |
V2.13 | July 1986. | Reduced memory requirements. |
V2.15 | August 1986. | Implemented COMMENT directive for compatibility; added floating point instruction set and DQ and DT directives. |
V2.16 | August 1986. | Made internal changes to accommodate forward referencing in D86's patch-memory mode. |
V2.18 | November 1986. | |
V2.90 | March 1987. | Test release for .OBJ support. |
V3.00 | April 1987. | Major upgrade. Added added support for linkable .OBJ files, long constants and floating-point constants, A86LIB library tool and A86LIB support, ability to forward-reference variables, 286 protected-mode and NEC-specific instructions, options not to insert errors in source, long forward JMP for local labels, and default decimal, "=" equate compatibility feature, double-quoted strings, and parentheses no longer required for most DUP right operands. |
V3.01 | April 1987. | Added "S" suppress-symtab and "C" case-sensitivity switches |
V3.05 | June 1987. | Added recognition of SEGMENT AT in non-OBJ mode. |
V3.07 | July 1987. | Added features necessary for Turbo C support (+c, +f, +F switches; ignore DGROUP:). Generalized the environment variable to include macro files. Added the ampersand feature. Made = compatible with MASM. |
V3.09 | August 1987. | Legalized MOV segreg,immediate. Duplicated MASM functionality for case-sensitive mode (A86's +C switch). |
V3.10 | September 1987. | Added a printed version of the manual. Added +c switch, reinstating case sensitivity during assembly, but this time without sensitivity in built-in symbols. |
V3.11 | November 1987. | Added the SEG operator for compatibility with Turbo C, and made it possible to define relocatable segments called CODE, DATA, or STACK, for compatibility with Turbo Pascal. |
V3.12 | February 1988. | Changed the format of SYM files, so that they are much smaller yet hold more information. Added macro features: #V value operator, #S size operator, #N number operator, #EX exit directive, string comparison of operands, and large operand numbers (up to 127). |
V3.13 | March 1988. | Made memory management more flexible, to allow A86 to run with less available memory. |
V3.15 | May 1988. | Allowed up-arrow in place of equals-sign in invocation equates. Allowed MOV mem,mem and XCHG of a variety of new forms, generating sequences of instructions to implement the unavailable forms. |
V3.17 | June 1988. | |
V3.18 | July 1988. | For compatibility: allowed OFFSET segname, and implicitly converted a constant with a segment override into a memory type. |
V3.19 | August 1988. | |
V3.20 | July 1989. | Made internal redesign of handling of size- override operators (B, W, D, F, etc.) so they are handled more consistently. Outlawed first DATA SEGMENT without a starting ORG statement, forcing an explicit ORG 0 for future compatibility. |
V3.21 | August 1989. | |
V3.22 | January 1990. | Added support for additional coprocessors: the 80387 and the IIT-2C87. Made numerous minor changes to enhance MASM-compatibility. |
V3.70 | January 1994. | Test release for INCLUDE support, forward ORGs, default ORG for DATA SEGMENT to the end of the program, listing files, macros in A86LIB, K numeric base, the DEF and REF operators, numeric operands to MOVx and STOSx, enforcement of processor-specific instructions, forward references in complicated expressions, and symbols beginning with a period. |
V4.00 | December 1994. | "Official" public release with all the new features mentioned in V3.70 above. Added COMPAT.8 file to implement some MASM directives as A86 macros. |
V4.01 | March 1995. | |
V4.02 | September 1995. |
287 directive (.287)
386 indexing, 25
387 support, 37
A-after operator in macros, 70
A86 environment variable, 15
A86.LIB file, 83
A86.LIB library catalog, 9
A86LIB environment variable, 84
A86LIB library tool, 83
AAD with operand, 23
AAM with operand, 23
about the author, 8
ABS operator in EXTRN, 61
absolute segments in OBJ mode, 64
address listing control, 13
address override byte, 27
address, my, 5
align a list of variables, 63
align specification, 63
alignment using EVEN, 52
allocation directives, 52
ampersand, specifies standard input, 15
AND expression operator, 47
angle brackets in MASM, 79
arithmetic on floating-point numbers, 38
arithmetic, 32-bit and LEA, 25
assembler variables, 56
assembler, learning, 11
assertion checking, 56
ASSUME directive, 26
asterisk multiplication operator, 46
AT combine type, 64
at-sign @, in symbols, 17
attribute operators/specifiers, 47
AUTOEXEC.BAT file, 15
automatic paging control, 14
automatic paging controls, 82
Ayala, Kenneth, 11
B operator in EXTRN, 61
B override expression operator, 47
B-before operator in macros, 70
base registers, 24
base, default, 78
based structure example, 24
based structures, 54
bases for numbers, 44
bases, ambiguous, 44
batch file controls, 75
BCD numbers, 39
benefits of registration, 6
BIN extension for object files, 51
BIN extension, 11
binary base, 44
Binary Coded Decimal numbers, 39
biography, 8
BIOS interface, books on, 11
BIT expression operator, 46
(pg 103) block structure in MASM,
79
books on assembler. recommended, 11
Boolean negation operator, 47
BP indexing size anomaly, 27
brackets, 48
British contact, 5
bugs, reporting, 8
built-in constant names, 39
built-in symbols, equates to, 55
BY operator, 46
BYTE align type, 63
BYTE override expression operator, 47
C programming language, linking to, 58
C switch, 12
C-loops in macros, 69
capacity, 7
capacity, source file, 16
case sensitivity, 12
case-insensitive comparisons, 47
catalog file A86.LIB, 83
categories of A86 elements, 17
cb specifier, 28
changing the default base, 44
character loops in macros, 69
characters allowable in symbols, 17
characters in A86 language, 17
choices for 87 operands, 39
class name, specifying, 64
classes, 60
clear register macro, 67
clear-register macro, 69
CODE ENDS directive, 51
code generation of forward references, 54
code label specifier, 28
CODE SEGMENT directive, 51
CODE segment, link to Pascal, 64
colon operator, 49
colon, deciding when to use, 19
columnar output, 9
COM extension, 12
COM programs, how to detect, 77
combine specification, 63
combine types, 63
combining switches, 15
COMMENT directive, 18
comments in macros, removal of, 67
comments, 17
COMMON combine type, 64
comparison of strings, 47
COMPAT.8 macro file, 77
compatibility, 77
compression of macro text, 67
Compuserve section, 8
computation models, 58
conditional assembly and macros, 74
conditional calls, see IF, 21
conditional jump, far, see IF, 21
conditional line filtering, 81
conditional returns, 21
(pg 104) conditionals, list control,
13
constant operand to FLD, 39
constants, floating, 45
constants, format of, 44
constants, large, 53
constants, overview, 19
contacting me, 8
control-character notation, 46
controls, invocation, user-definable, 75
converting MASM programs, 77
cp specifier, 28
CPU-specific instructions, 13
crashes, on lack of FWAIT, 37
creating programs to assemble, 11
credit cards, 5
cross reference demo, 9
cross-reference listing, 82
cross-reference output switch, 15
cv specifier, 28
D operator in EXTRN, 61
D override expression operator, 47
D switch, 12
DATA ENDS directive, 51
DATA SEGMENT directive, 51
DATA segment, link to Pascal, 64
DB directive, 52
DD directive, 52
DD examples, 53
DEC, multiple and numeric operands, 21
decimal base, 44
decimal output of macro operands, 71
DEF operator, 49
default base, changing, 44
default base, decimal, 12
default bases, 44
default forward-reference, 13
default operand in a macro, 50
default output file name, 16
default segment registers, 25
default segment, OBJ mode, 65
defined symbols, testing for, 49
defining macros, 67
demonstration, 9
description of 87 instructions, 40
description of instructions, 29
Dettman, Terry, 11
digits in file names, 16
digits, hex, 44
directives in A86, 51
displacement field, 26
displaying macro expansions, 73
division operator, 46
dollar sign $, in symbols, 17
dollar-sign specifier, 49
DOS interface, books on, 11
double hash signs ## in macros, 67
double hash signs ## in macros, 75
double quote-marks in strings, 53
double-precision, 39
(pg 105) doubleword indexing, 25
doubleword pointer initialization, 53
DQ directive, 52
DQ example, 53
DT directive, 52
DT example, 53
DUP construct, 53
duplicate definitions, 55
DW directive, 52
DWORD align type, 63
DWORD override expression operator, 47
E switch, 12
e-mail address, 8
EA byte, 26
eb specifier, 28
EBP indexing size anomaly, 27
editing programs, 11
effective addresses, 24
effective addresses, encoding, 26
electronic mail, 8
ELSE operator, 50
ELSE, 74
ELSEIF, 74
EM end-of-macro symbol, 67
emulation, floating-point, 38
encoding of effective addresses, 26
encoding of floating-point numbers, 45
END directive, 62
end of a macro, 67
end of file, 62
END used as an operand, 54
end-of-program memory, 51
ENDIF, 74
ENDM, 78
ENDP directive, 56
ENDS directive, OBJ mode, 64
ENDS directives in COM mode, 51
English contact, 5
environment variable A86LIB, 84
environment variable, A86, 15
EQ expression operator, 47
EQ in comparing strings, 47
EQU directive, 55
equals-sign directive, 56
equals-sign string compare, 47
equates to built-in symbols, 55
equates to interrupts, 55
ER end-of-repeat symbol, 69
ERDEMO.BAT batch file, 9
ERR and ERRcond, 78
ERR extension, 12
error file redirection, 12
error messages, 7
error messages, explanation, 86
ev specifier, 28
evaluating macro operands, 71
EVEN directive, 52
EX exit macro specifier, 72
examples of floating constants, 45
(pg 106) examples of numbers, 17
examples of type matching, 19
examples of useful memory accesses, 26
exclamation-point operator, 47
exclusive features, 21
EXE programs, how to detect, 77
exiting from middle of macro, 72
EXITM simulation, 72
EXMAC, what happened to, 73
explicit EXTRNs, 13
explicit EXTRNs, forcing, 62
explicit OBJ specification, 58
explicit public names, 61
explicit WAITs, 37
exponent specifier, 45
expressions and forward references, 54
expressions in conditional assembly, 73
expressions, overview of, 45
extended-precision operands, 39
extensions of source files, 15
external names and LINK, 59
extra coprocessor support, 37
EXTRN directive, 61
EXTRNs, explicit, 13
F override expression operator, 47
F switch, 12
F switch, 38
FALSE in conditional assembly, 73
FALSE return value, 47
FAR operator in EXTRN, 61
FAR override expression operator, 47
far-label constants, 49
FBANK instruction on IIT-2C87, 37
FDISI instruction, 37
features, exclusive, 21
FENI instruction, 37
file breaks, listing control, 14
file in which a symbol was defined, 82
file lists, 15
file maintenance, 15
file names, digits in, 16
files, source, 11
filtering conditional lines, 81
FLAT directive, 57
FLAT segment specification, 64
FLD, immediate operand, 39
floating constants, examples of, 45
floating point operands, choices for, 39
floating-point constants, format of, 45
floating-point emulation, 38
floating-point instructions, 37
floating-point operand types, 39
floating-point stack, 38
footprint, code-generation, 6
forcing explicit EXTRNs, 18, 62
forcing library lookup, 84
format of assembler source lines, 18
format of macros, 67
formfeed control, 14
(pg 107) FORTRAN, 64
forward references, 54
forward references, 78
forward-reference typing, 13
fragments, 60
free memory, allocating, 51
FSETPM instruction, 37
FSTSW AX form, 37
FWAIT instruction, 37
G switch in EXTRN, 62
G switch, 12
gaps in code, page breaks at, 82
GE expression operator, 47
Great Britain contact, 5
greater-mark > for local symbols, 22
GROUP directive, 65
groups, reason for, 60
GT expression operator, 47
H switch, 13
hash sign # in invocation, 84
hash sign #, conditional assembly, 73
hash signs # in macros, 67
hash signs # in macros, 75
hash signs #, literalizing in macros, 67
hex address listing control, 13
hex object lines, extra, 13
hexadecimal base, 44
hexadecimal digits, 44
HIGH operator, 46
high-level-language computation models, 58
history of A86, 93
I switch, 13
ib specifier, 28
IBM, 77
IEEE standard for floating-point, 45
IF conditional assembly symbol, 73
IF statement, 21
IFB, 78
IFDEF, 78
IFDIF, 78
IFE, 78
IFIDN, 78
IFNB, 78
IFNDEF, 78
IIT-2C87 support, 37
immediate operand to FLD, 39
implicit public names, 61
INC, multiple and numeric operands, 21
incentives to register, 6
INCLUDE directive, 57
INCLUDE file listing control, 15
INCLUDE with no file name, 85
indefinite repeats, 69
indentation of source listing, 13
indentation of wraparound lines, 14
index expressions, 45
index registers, 24
indexed memory, 24
indexing, A386, 25
(pg 108) inferior assemblers, 77
inferior assemblers, porting to, 79
initializations of floating-point, 38
instruction set chart, explanation, 28
instruction set, 87-family, 40
instructions on specific CPUs, 13
instructions, list of, 29
instructions, special, 29
integer operands to 87, 39
Intel assembler, 77
Intel meeting, 59
intermediate numeric results, 39
Internet mail address, 8
interrupt equates, 55
interrupt handlers, 77
invocation variables, 75
invoking A86, 11
IRET as an operand, 21
IRP and IRPC functionality, 67
IRP, 78
IRPC, 78
iv specifier, 28
JHASH example, 72
JMP, long default, 12
K base for numbers, 44
keyboard entry coding example, 68
L last-operand in macros, 69
L switch for listing, 81
L switch, 13
L2E and L2T constants, 39
LABEL directive, 57
labels, examples, 19
large constant initialization, 53
large macro operand numbers, 72
LARGE model of segmentation, 58
last-operand in macros, 69
LE expression operator, 47
LEA and 32-bit arithmetic, 25
LEA instruction, optimizing, 12
LEA optimization, 23
leading underscore in C, 58
Legal Terms, 5
length byte, generating in macro, 71
length of a symbol name, 17
LG2 constant, 39
library search, forcing, 84
library search, trigger in source, 57
line numbers, suppressing, 13
line-format, 18
LINES.8 library file, 9
LINK program, 58
linkage, 58
linker, where to get a, 11
list controls, suppressing, 13
LIST directive (leading period), 81
list of instructions, 29
listing control directives, 81
listing control switches, 81
listing hex object bytes, 13
(pg 109) listing indentation of source,
13
listing of 87 instructions, 40
listing of cross-references, 82
listing, how to activate, 13
listing, specific formats, 81
listings in A86, 81
LN2 constant, 39
loading named segments, 77
local labels in macros, 72
local labels, simulating, 80
local symbols, 22
local symbols, specifying, 56
location, this, specifier, 49
logarithmic constants, 39
logical operators, 47
long default JMP, 12
LONG override expression operator, 48
looping in macros, 69
loops with large index, 72
LOW operator, 46
lower case letters in symbols, 12
LST file, producing, 81
LT expression operator, 47
m specifier, 28
macro compatibility, 78
macro default operands, 50
macro exiting from within loop, 72
macro expansions, displaying, 73
macro file, default, 15
macro libraries, making, 84
macro listing global control, 13
macro loops, skipping increments, 70
macro operand substitution, 67
macros and conditional assembly, 74
macros, defining, 67
main module, 59
MAIN symbol, 62
maintenance of files, 15
manual, scope of, 11
MASM compatible CODE, DATA, 64
MASM conditional assembly, simulating, 75
MASM-compatibility, 77
matching of types, examples, 19
matrix multiplication on IIT-2C87, 37
maximum length of a symbol name, 17
maximum source file size, 16
meeting at Intel, 59
memory alignment using EVEN, 52
memory allocation directives, 52
MEMORY combine type, 64
memory forms, overlooked, 26
memory operand forms to 87 instructions, 39
memory requirements, 16
memory variables, specifying, 24
memory, allocating free, 51
memory-resident programs, 77
menu systems and A86, 15
Microsoft, 77
minus operator, 46
(pg 110) MIX tool, compatibility,
12
mnemonics, 86-family, 29
mnemonics, floating-point, 40
mnemonics, one for many instructions, 19
MOD modulo operator, 46
model of segmentation, grotesque, 59
models of segmentation, 58
ModRM byte, 26
module names, 60
modules, object, 59
MOV immediate into seg regs, 22
MOV of memory operands, 22
MOV of segment registers, 22
MOV substitute for LEA, 23
MOV with three operands, 22
move memory macro example, 67
MOVSx, numeric operand to, 21
MSDOS.8 library file, 9
MTCOLS.BAT batch file, 9
multiple allocation using DUP, 53
multiple files in OBJ mode, 58
multiple increments in macro loops, 70
multiple operands to PUSH,POP,INC,DEC, 21
multiply by 10 coding example, 58
multiply operator, 46
NAME directive, 60
name of output files, 11
NE expression operator, 47
NE in comparing strings, 47
NEAR operator in EXTRN, 61
NEAR override expression operator, 48
NEC chips and AAD, 23
NEC chips, special instructions, 29
NEC instructions, allowing, 13
negation, Boolean, 47
negative R-loops in macros, 70
nested IF blocks, 74
nesting of loops in macros, 71
new file listing control, 14
NIL prefix, 55
NOLIST directive (leading period), 81
non-combinable segments, 64
NOP in even directive, 52
NOT expression operator, 47
null invocation variable names, 75
null operands to macros, 68
number operands in expressions, 45
numbering, suppressing, 13
numbers, examples, 17
numbers, examples, 44
numbers, floating, 45
numbers, format of, 44
numeric operands to INC, DEC, 21
numeric operands to STOSx, MOVSx, 21
O switch, 11
O switch, 13
O switch, 58
OBJ extension, 11
OBJ file generation, 13
(pg 111) OBJ internal optimization,
12
OBJ production made easy, 58
object file name, 11
object modules, 59
obnoxious MASM headers, 81
octal base, 44
OFFSET override expression operator, 48
Oklahoma, 74
online support, 8
opcodes, 86-family, 29
opcodes, 87-family, 40
operand choices for 87 instructions, 39
operand number, generating, 71
operand override byte, 27
operand types to 87 instructions, 39
operating system requirements, 16
operation of A86, 11
operator precedence, 50
optimized LEA instruction, 23
OR expression operator, 47
ORG directive, 51
outer segment, OBJ mode, 65
output file name, default, 16
output files, naming, 11
overlooked memory forms, 26
overrides, 32-bit, examples, 27
overrides, segment, 25
overrides, segment, 77
overview of A86, 7
overview of expressions, 45
P switch, 13
PAGE align type, 63
page breaks, automatic, 82
page breaks, manual, 82
PAGE directive, 82
page numbers, column control, 15
PAGE.8 program, 9
PAGE.BAD source file, 9
pagination control switch T, 14
paging, automatic, 14
PARA align type, 63
parameters, MASM local, 78
parenthesized operand numbers, 72
Pascal segment names, 64
Pascal, linking to, 59
passing macro operands by value, 71
Pentium instructions, 29
period as first character of a symbol, 17
period operator, 46
permanent switch settings, 15
phone number, my, 5
PI constant, 39
piping file names to A86, 15
plus operator, 46
POP, multiple operands, 21
POPA simulation for 8088, 13
port programs to inferior assemblers, 79
Power C compatibility, 12
powers of ten, 45
(pg 112) precedence of operators,
50
prices, 5
printer eject program, 9
PROC directive in MASM, 78
PROC directive, 56
PROC operator in EXTRN, 61
procedure-level summary listings, 82
procedures, 56
processor control, 13
processor-specific instructions, 29
program invocation, A86, 11
program location specifier, 49
program size in expressions (END), 54
program starting location, OBJ mode, 62
program, memory beyond the, 51
programs, how to create, 11
prompt for file names, 15
PUBLIC combine type, 63
PUBLIC directive, 61
public names and LINK, 59
PUSH multiple operands, simulating, 80
PUSH, multiple operands, 21
PUSHA simulation for 8088, 13
Q operator in EXTRN, 61
Q override expression operator, 47
question mark ?, in symbols, 17
question-mark specifier, 53
quote-marks in strings, 53
quoted string macro operands, 68
QWORD override expression operator, 47
R-loops, negative, 70
RADIX directive, 44
rb register specifier, 28
red tape, 58
red tape, 7
redefinable symbols, 22
redefining symbols, 55
redirection of error files, 12
REF operator, 49
referenced symbols, testing for, 49
references of symbols, listing, 82
registers, general, 24
registration benefits, 6
registration benefits, 83
regsiter set, 18
relational operators, 47
release history, A86, 93
relocation and linkage, 58
repeat counts to string instructions, 21
repeating code using DUP, 53
REPT directive, simulating, 72
requirements, system, 16
reserved symbols, 17
RET instruction, meaning of, 56
RET(F) as an operand, 21
RETF instruction and PROC, 56
REV.8 source file, 9
reversing strings example, 9
rotate immediate simulation for 8088, 13
(pg 113) rv register specifier, 28
S switch, 14
scaled indexing, 25
scientific notation, 58, 45
section number control, 14
section numbers, controlling, 82
SEG operator, 65
SEGMENT AT, non-OBJ, 79
SEGMENT directive, non-OBJ mode, 79
SEGMENT directive, OBJ mode, 62
segment override colon operator, 49
segment overrides, 25
segment overrides, 77
segment registers managed by GROUP, 65
segment registers, default, 25
segment registers, special MOVs, 22
segmentation and memory access, 25
segmentation models, 58
segments in A86, 51
segments, loading named, 77
selective NOLIST for macros, 81
shareware distribution, 5
shift immediate simulation for 8088, 13
shifting expression operators, 46
SHL and SHR expression operators, 46
SHORT override expression operator, 48
simple macro syntax, 67
single-precision floating point, 39
size of effective addresses, 26
size of macro operands, 71
size of program in expressions (END), 54
size of source files, 16
size of structures, 49
skipped lines, suppressing, 13
slash division operator, 46
slash specifier, 26
SMALL model of segmentation, 58
source file as default TITLE, 14
source file library for sale, 7
source files, 11
source libraries, 83
special instructions, 29
speed, 7
square brackets operator, 48
ST floating-stack operator, 49
STACK combine type, 63
STACK segment, relocatable, 63
stack segments in OBJ mode, 63
stack, floating-point, 38
standard input command tail, 15
star multiplication operator, 46
starting location, OBJ mode, 62
STOSx, numeric operand to, 21
strategies for file maintenance, 15
string allocation, 53
string comparison operators, 47
STRUC directive, 54
STRUC, implicit via SEGMENT AT, 79
structure, based, example, 24
(pg 114) structured programming constructs,
21
structures and MASM, 79
structures initialization, 79
structures, size of, 49
subdirectories of programs, 16
substitution of macro operands, 67
subtitle default to source file, 14
subtraction operator, 46
SUBTTL subtitle directive, 82
summary of procedure calls, 82
suppressing line numbers, 13
suppressing list control, 13
suppressing skipped lines, 13
suppressing SYM file, 14
switch settings, permanent, 15
switches, combining, 15
switches, user-definable, 75
SYM file, suppressing, 14
symbol listing control, 13
symbols, redefining, 55
symbols,. allowable characters for, 17
system crashes on lack of FWAIT, 37
system requirements, A86, 16
T operator in EXTRN, 61
T override expression operator, 47
T switch, 14
tabs, recommendation against, 18
TBYTE override expression operator, 47
TCOLS.8 source file, 9
telephone number, my, 5
terms, legal, 5
TEXAS invocation switch, 75
Texas, 74
TEXT segment name, 65
THIS specifier, 49
tips for memory access, 26
TITLE default to source file, 14
TITLE directive, 82
titling control switch T, 14
TO in invocation, 11
TRUE in conditional assembly, 73
TRUE return value, 47
truncation of listing lines, 14
Turbo Pascal segment names, 64
Turbo Pascal, linking to, 59
type matching, examples, 19
TYPE operator, 49
types in the A86 language, 18
types, assumed, 58
UND undefined symbols file, 62
undefined symbol types, assumed, 58
undefined symbols listing in OBJ mode, 62
underscore, in symbols, 17
underscore, leading, in C, 58
underscores within numbers, 44
up-arrow symbol and invocation equates, 75
USAGE.8 library file, 9
use specification, 64
USE16 and USE32 directives, 57
(pg 115) USE16 and USE32 segments,
64
user symbols, 17
USES clause, converting, 78
value, passing operands by, 71
variable operands in expressions, 45
variables declared at invocation, 75
variables, assembler, 56
variables, examples, 18
variables, forward references of, 54
verbose forms, floating-point, 39
verbose PROC, 56
version history, A86, 93
W operator in EXTRN, 61
W override expression operator, 47
W switch, 14
WAIT instruction, 37
Wettstein, Greg, 8
widowed listing lines, avoiding, 82
wild cards, order of, 16
WORD align type, 63
WORD override expression operator, 47
wraparound listing control, 14
X specifier for numeric bases, 44
X switch, 15
XCHG of memory operands, 22
XCHG of segment register, 22
XOR expression operator, 47
XREF output switch, 15
XRF files, producing , 82