Oh boy. Now we get to the fun part. Here is where we try to make it work. We will consider several configurations of memory, but first it might be good to examine the environment we wish to implement in; the Z80 CPU.
The Z80 is an 8 bit microprocessor. It uses 16 bits for memory addressing giving it the ability to address 64K of memory. This is not much by today's standards. It is possible to make the Z80 address more memory by adding external circuitry. With this circuitry it would be possible to make the Z80 address as much memory as we want; say 4GB. A Z80 addressing 4GB of memory might not be quite practical. After all, what would it do with it? However, something a little more down to earth might be useful; say 256K, or 1-4MB.
The first thing we must understand is this. No matter what external circuit we come up with, the Z80 will only address 64K at any given moment in time. What we need is a way to change where in the PHYSICAL address space the Z80 is working from moment to moment. This function is called memory management. The circuit that performs the memory management is called an MMU, or Memory Management Unit.
Today everyone is probably experienced with running 386MAX, QEMM, or HIMEM on their PC's. This is the memory management software that runs the MMU in our 386/486/Pentium processors. The PC uses memory management for a different function than what we might use it for in the Z80, since the 386/486/Pentium processors are inherently capable of directly addressing a full 4GB of memory. With the Z80, we need an MMU just to even get at all of the memory we may have in the system.
The basic idea of how a memory manager works is this. There is a large PHYSICAL memory space defined by the amount of memory plugged into the system. If you plugged in 256K of memory, then your physical address space is 256K, and so on. When a memory manager maps memory for the Z80 processor, the 64K address space of the Z80 becomes the LOGICAL address space.
The logical address space is broken up, by the MMU, into small chunks. The next thing we must decide is how big the chunks are. They can be as small as we want, say 512 bytes. For our Z80's 64K logical address space we would need 128 pages in our MMU to implement this. If we are building a multitasking system some or most of these MMU pages may need to be rewritten each time we have a task switch. This greatly increases system overhead. We want the task switch to be accomplished as fast as possible. The code we execute during the task switch doesn't contribute to the running of our application task. It is just overhead. We would also need to design hardware that could provide that many pages in our MMU. We could certainly do this, but it would increase our chip count, and the MMU may not be fast enough for our needs.
Ok, 512 bytes per page is too fine for our needs. Let's look at 4K pages. Again, for our Z80's 64K logical address space, we would now need 16 pages. This sounds a lot better. Very fast hardware register file chips are available with 16 registers, that will meet our needs; the 74xx189. The 74xx189 is a 16 by 4 register file chip. You can stack them to get any width you need.
As we said earlier, if we are using 4K pages, we will need 16 of them to accommodate the 64K logical address space of the Z80 CPU. To address 16 pages in our external MMU we will need four address lines. We will use the uppermost address lines on the Z80. The block diagram of our MMU is shown in the following illustration.
Figure 17 shows the basic Z80 CPU implemented with an MMU. The MMU is made from three 74x189 register files. These 3 parts develop 12 address lines. When put with the low order 12 address lines from the Z80, we have 24 address lines, enough to address 16MB of memory. If we limited our design to 1MB of ram we could eliminate one of the 189's and simplify the control software somewhat. For the rest of our discussion we will assume three 189's.
One of the first things we must deal with is initializing the MMU. If the MMU doesn't come up in a defined state at power up, and it doesn't, then we must somehow initialize it. This also means that our ROM accesses must not go through the MMU because we can't depend on it decoding the ROM addresses at power up. We'll look at how to get around this in a minute. For now, just assume that we can execute instructions from the ROM to initialize the MMU.
The first thing we must do to the MMU is to bring it to a known state. Figure 18 shows the MMU mapping we may wish to have at start up. This is simply a one to one mapping. The lower 64K of the physical address space is mapped onto the lower 64K logical address space. Now we can have a stack, make subroutine calls, service interrupts, etc. All the things a Z80 likes to do.
After we have initialized the MMU to its' default state we can start our application program running. For the sake of discussion let's say that we are designing a data logging system. Further, let's say that this system uses a BASIC interpreter in ROM, and that the application program is also in ROM. The Z80's logical address space may look like this.
In figure 19 we see a possible layout of the Z80's 64K logical address space for our data logging application. We haven't said anything yet about how this is mapped to the physical memory. If we use the default mapping we set up in figure 18 we're almost there. We need to account for the ROM and reserve the last page for mapping. That's all there is to it. RIGGGHT!!! Welllll. That's almost true.
OK. Let's deal with the ROM first. What I would do with it is get rid of it. We use it just long enough to get the CPU up and then switch it out, never to use it again; until the next hardware reset, or power up. Once we have the MMU initialized, and the memory manager running, we really don't want any memory active that is not going through the MMU. Remember that we said the ROM couldn't go through the MMU. This is one of the chicken/egg problems. We can't decode the ROM addresses from the MMU because it comes up in an unknown state. If we can't decode the ROM addresses from the MMU then we have no way to execute code so we can initialize the MMU so it can decode ROM addresses. Quite a mess huh? Well, there is a simple solution.
When the Z80 is reset we set a flip-flop that allows ALL memory reads, regardless of the address, to go to the ROM. The ROM has its address pins tied DIRECTLY to the Z80 CPU chip pins, not to the MMU. Now we can execute code at reset. After a quick thought you say "Hey now. If all reads go to the ROM, how do we access our stack?" The answer is "We don't." This is just a very temporary state we go through in bringing up the processor. The following code will establish the default state shown in figure 18.
/* Also available in LIST1.ASM */
; INITIALIZE THE MMU TO THE DEFAULT STATE ; THIS WILL ALLOW A ONE TO ONE MAPPING FROM ; PHYSICAL TO LOGICAL ADDRESSES. THE ; FIRST 64K OF DRAM IS MAPPED INTO THE Z80'S ; LOGICAL ADDRESS SPACE. ; ; THE FOLLOWING TABLE CONTAINS THE VALUES ; TO BE WRITTEN TO THE MMU ON STARTUP. ; ; MMU.START: DW 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F ; DEFAULT MAPPING ; MMU.LO: EQU ## ; I/O PORT ADDRESS FOR LOW TWO 189 CHIPS MMU.HI: EQU ## ; HIGH 189 CHIP ; KILL.ROM: EQU ## ; I/O DECODE THAT DISABLES ROM ; ; ; NOTE : THIS CODE ASSUMES THAT THE TWO GROUPS OF 189 CHIPS ; ARE DECODED AT SUCCESSIVE I/O PORTS. ; ; SET.DEFAULT: LD HL, MMU.START ; POINT TO MMU TABLE LD B, 0 ; ADDRESS FIRST ENTRY IN MMU MMU.LOOP: LD C, MMU.LO ; GET ADDRESS OF LOW 189 GROUP LD A,(HL) ; GET TABLE ENTRY CPL A ; INVERT DATA OUT (C),A ; WRITE TO LOW 189 GROUP INC HL ; POINT TO NEXT BYTE IN TABLE LD A,(HL) ; GET IT CPL A ; INVERT IT INC C ; POINT TO HIGH GROUP 189 OUT (C),A ; WRITE IT INC HL ; BUMP TABLE POINTER LD A,B ; GET MMU REG POINTER ADD A,10H ; BUMP IT IN THE HIGH 4 BITS LD B,A ; PUT IT BACK CP A,0 ; WAS THIS THE LAST ONE? IR NZ,MMU.LOOP ; KEEP GOING IF NOT ; ; ; WE NOW HAVE RAM MAPPED. WE CAN COPY THE ROM INTO RAM ; AND SWITCH OUT THE ROM. ; ; LD HL, 0 ; SET UP SORCE ADDRESS LD DE, 0 ; SET UP DEST ADDRESS LD BC, 8000H ; GET LENGTH = 32K LDIR ; COPY ALL OF ROM TO RAM OUT (KILL.ROM), A ; SWITCH ROM OUT ; ; FROM HERE ON, WE ARE RUNNING IN RAM. ; LD SP,7FFFH ; SET STACK . .
The above code segment will handle MMU initialization. It first sets up the default mapping of one to one. The first 64K of the physical address space is mapped onto the Z80's logical address space. Then the contents of the ROM are copied into the DRAM. (I never said that writes couldn't go to the dram). The LDIR instruction very nicely copies the first 32K, which is all of the ROM, into the dram, at the same logical address. We couldn't have done this until the MMU was initialized.
Now, if we just had a couple of variables we could write a routine that would step the page in the last MMU slot. If this routine were called repeatedly it would result in "walking" a window through the entire address space. The window will appear in the last 4K of the Z80's logical address space, 0F000H to 0FFFFH.
/* Also available in LIST2.ASM */
; THIS ROUTINE WILL STEP THE LAST PAGE OF THE MMU. SINCE WE ; CAN'T READ THE MMU WITH AN I/O INSTRUCTION, WE MUST KEEP ; AN IMAGE OF WHAT WE PUT IN IT. THIS ROUTINE WILL ALSO MAKE ; IT CLEAR WHY WE COMPLIMENT THE DATA BEFORE WRITING IT TO THE ; 189'S. IT IS A LOT EASIER TO DO BINARY ARITHMETIC ON POSTIIVE ; NUMBERS. SINCE THE 189'S INVERT THE OUTPUTS, WE INVERT, OR ; COMPLIMENT, THE NUMBER WE PUT IN, SO WE WILL GET OUT WHAT ; WE WANT. ; ; IF THE MMU WRAPS AROUND 16MB, THEN THIS ROUTINE WILL RETURN ; WITH "NZ", OR "Z" IF NO ERROR ; LAST.PAGE: DW 0FH ;INITIAL SETTING FOR LAST PAGE IN MMU ; INC.MMU: LD HL,(LAST.PAGE) ;GET LAST PAGE VALUE INC HL ;BUMP IT BIT 4,H ;DID WE WRAP AROUND 16MB? IR NZ,MMU.ERR ;ERROR IF SO LD (LAST.PAGE),HL ;SAVE NEW MMU VALUE LD B,0F0H ;POINT TO LAST MMU PAGE LD C,MMU.LO ;GET POINTER TO WRITE TO MMU LD A,L ;GET LSB BYTE OF NEW MMU ENTRY CPL A ;INVERT DATA OUT (C),A ;WRITE TO LOW 189 GROUP LD A,H ;GET LSB BYTE OF NEW MMU ENTRY CPL A ;INVERT IT INC C ;POINT TO HIGH GROUP 189 OUT (C),A ;WRITE IT XOR A,A ;CLEAR Z FLAG RET ;SEND BACK GOOD COMPLETION ; MMU.ERR: LD A,0FFH ;SEND BACK ERROR AND A,A ;TO CALLER BY SETTING RET ;NZ
If we want to bump the MMU page the above code will do the job for us. When we overflow the 16MB barrier we will get back an NZ status, and no change will be made in the MMU. I will leave it as an exercise for the student to figure out what would happen to the system if this test were not included. What would happen? Oh heck, I can't keep a secret. It would start writing over memory at physical location 00000H. Since we put our BASIC interpreter, and interrupt vectors there, the system would crash. All you would see of it is a little mushroom cloud over the CPU chip.
When setting up the system memory map we must be sure of a couple of things. Certain things must always be available. Some of these things are : Interrupt Service Routines, or ISRs, Interrupt/trap vector tables, and the MMU management code itself. For example, the routine shown above would need to be in common memory. At the least, it would not be good to load this routine anywhere in the range of 0F000H to 0FFFFH. If you did, the results would be a system crash very alike to the one in the previous paragraph. The Z80 would be execution along until it hit the first I/O instruction which changed the MMU page. After the write the MMU would be pointing to a different place in memory and the next instruction fetched would not be likely to be what we want.
There is a way to make this work; you must make sure that the memory page you are switching to also has a copy of the same routine, in the same place in memory. Then it would work. Why would I ever want to do this? Well, let's consider another example application for our MMU circuit; multi-tasking. Let's say that we want to set up a system to watch four serial lines. When data is presented from the SIO, we will store it in memory. To make it easy we would like to write only one copy of the program, and let it multi-task to manage the four serial lines. We will need to write a small multi-tasking kernel. It will handle setting up the four tasks, and any task switching we may need. We will assume a timer interrupt driven preemptive multi-tasking environment. Since the serial lines are using interrupts we must have the ISRs in common memory, or at least duplicated once per task.
The memory model in figure 20 might suit our needs for the multi-tasking system. Notice that there is no space shown in the model for ISRs, kernel, etc. It is all lumped together and called "code". Also note that the code for each task is identical. When each task is started up it is given a task ID, or identifier. This may simple be a byte value in each tasks own memory. It will identify the task to the kernel.
Since the ISRs are actually considered to be part of the kernel, incoming data from the serial lines would be placed in a buffer. The task may get the data a couple of ways. First, it could request a "wait for semaphore" from the kernel. In this case the task will be suspended until a byte is received. When this happens, the data is still placed in a buffer. The task is flagged as "ready to run" and started up the next time the kernel is looking for something to do.
Another way to accomplish the same thing is to have each task periodically poll the buffer to see if anything is in it. If so, the data is accessed and processed. If not, the process should make a kernel call to voluntarily surrender the CPU, assuming that it is stalled waiting for data.
The major difference between the two methods has to deal with the sophistication required in the real time kernel. For the first method the kernel must be able to handle semaphores and connect them to events. It must also be able to suspend a process and restart it. These are common features of commercial real time kernels. Once such kernel I have worked with is the USX80 kernel. It runs on a Z80 and provides all the features listed, and more.
In the second method, most of the "smarts" is moved to the application. A mechanism is required to switch tasks. This may be as simple as saving the machine state. I.E. : CPU registers and flags, to suspend a process. To restart the next process the kernel uses the task number to index into a table of MMU values and reprograms the MMU. If any MMU pages are allowed to be changed, then they will need to be restored from variables stored in each tasks memory, after the low MMU registers have been switched to point to the code space for the task. The tasks CPU registers are then loaded back into the CPU, and the process restarted. This is fairly simple code.
The call made by the process to give up the CPU only forced a task switch. When a task losses the CPU because of a timer interrupt it is called preemptive multi-tasking. When a task voluntarily gives up the CPU it is called voluntary multi-tasking.
Both techniques may be combined in a system, and that is very appropriate for a Z80. In data logging applications it is not hard to overrun the CPU if the data comes in too fast. If you have one very high speed data channel, and the rest are of moderate data rate, the inclusion of the voluntary task switch call may speed the system up considerably. The timer based task switch will guarantee that no task can hog the CPU, but it does not allow you to recover idle time from each task by itself. You need both methods together to do that.
If implementing the simpler task switching system, my personal favorite, it would be a good idea to reinitialize the timer chip (within the kernel) when you execute the voluntary task switch. The timer interrupt will be asynchronous with respect to the task switch call so you don't know how much time remains before the timer will generate an interrupt. When you start the next task you would like it to have a full time slice to run before it is interrupted.
Ok, so how do we initialize the memory model in figure 20? I'll bet you thought I'd forgotten that, didn't you? We map the code space for each task into the upper 32K of the Z80's logical address space, then block copy the code into it. The following code will do the trick. Assume that initially we established our default memory map from figure 18. Now we will remap the upper 32K to map in the code space for each task in turn, and copy the code into it. Note that we don't need to map TASK 0 since it is already mapped in by default.
/* Also available in LIST3.ASM */
; ; ; MMU.DEFAULT: DW 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F ; DEAFULT MAP TASK1: DW 10H, 11H, 12H, 13H, 14H, 15H, 16H, 17H ; MAP ONLY 32K TASK2: DW 20H, 21H, 22H, 23H, 24H, 25H, 26H, 27H ; MAP ONLY 32K TASK3: DW 30H, 31H, 32H, 33H, 34H, 35H, 36H, 37H ; MAP ONLY 32K ; ; COPY CODE TO ALL TASK CODE AREAS. ; COPY.TASK: LD HL, TASK1 ; POINT TO TASK 1 MMU TABLE LD B, 80H ; START IN SECOND HALF OF MMU CALL SET.MMU.32K ; MAP 32K LD HL, 0 ; SOURCE LD DE, 8000H ; DESTINATION LD BC, 8000H ; LENGTH LDIR ; COPY 32K UP LD HL, TASK2 ; POINT TO TASK 2 MMU TABLE LD B, 80H ; START IN SECOND HALF OF MMU CALL SET.MMU.32K ; MAP 32K LD HL, 0 ; SOURCE LD DE, 8000H ; DESTINATION LD BC, 8000H ; LENGTH LDIR ; COPY 32K UP LD HL, TASK3 ; POINT TO TASK 3 MMU TABLE LD B, 80H ; START IN SECOND HALF OF MMU CALL SET.MMU.32K ; MAP 32K LD HL, 0 ; SOURCE LD DE, 8000H ; DESTINATION LD BC, 8000H ; LENGTH LDIR ; COPY 32K UP LD HL, MMU.DEFAULT+16 ; RELOAD DEFAULT MMU MAP LD B, 80H ; START IN SECOND HALF OF MMU CALL SET.MMU.32K ; MAP 32K RET ; DONE ; ; THIS ROUTINE WILL REWRITE THE MMU MAP ; ; NOTE : THIS ROUTINE DOES NOT VALUE CHECK THE MMU ; VALUES WRITTEN. IT IS EXPECTED THAT THIS WILL BE A CRITICAL PATH ; ROUTINE SO ALL EXTRA PROCESSING IS OMITTED IN THE INTEREST OF ; SPEED. THE TABLES USED SHOULD BE CONSTRUCTED WITH CARE. ; ; ENTER : HL ==> TABLE OF MMU VALUES ; B = MMU PAGE NUMBER TO START CHANGING ; SET.MMU.32K: LD C, MMU.LO ; GET ADDRESS OF LOW 189 GROUP LD A,(HL) ; GET TABLE ENTRY CPL A ; INVERT DATA OUT (C), A ; WIRTE TO LOW 189 GROUP INC HL ; POINT TO NEXT BYTE IN TABLE LD A,(HL) ; GET IT CPL A ; INVERT IT INC C ; POINT TO HIGH GROUP 189 OUT (C), A ; WRITE IT INC HL ; BUMP TABLE POINTER LD A, B ; GET MMU REG POINTER ADD A,10H ; BUMP IT IN THE HIGH 4 BITS LD B, A ; PUT IT BACK CP A, 0 ; WAS THIS THE LAST ONE ? JR NZ, SETMMU.32K ; KEEP GOING IF NOT RET ; EXIT WHEN DONE
The above code will take care of initializing the four tasks code area. Remember that we said that each task would use the same code. We just map in the code space for each task and copy the code to it. Simple! Now, since each task has the same image of the low 32K of memory we can switch all of the MMU pages when we do a task switch if we want to.
Developing a complete multitasking kernel here would be beyond the scope of this paper, which is implementing DRAM on the Z80. The MMU is an integral part of the memory system so we) must know how it works in order to implement our memory system.
Next, lets' look at how the Z80 uses memory.