PIC 16 Delay Tutorial

by Vasily Koudymov

In programming the PIC16 family of microcontrollers, it is sometimes necessary to do absolutely nothing for a certain number of cycles thereby causing a real world delay of some amount of time. This can be useful if one is programming a clock or frequency generator for example, and it is often easier to implement a delay loop rather than using a built in TMR timer.

This document will use the following PIC16 assembly instructions:

decfsz f,d:
Although f refers to a memory location, when referring to operations performed upon f it is easier to say something like [ f = (f - 1) ] to refer to the idea of “the value at the address which is called f will now be equal to the original value at address f with one subtracted from it.” I only say this as any reader who is feeling particularly anal will say that I am in error when I say, “The new value of f will be (f - 1).” Note that although I will say that, it refers to the longer idea, it just makes following the operations much easier. Let's return to the original topic.

The instruction, decfsz performs the operation (f - 1), and if d is substituted with the character 'W', the result (f - 1) will be placed in the Working Register. If d is substituted with the character 'F', it will be placed back into f such that [ f = (f - 1) ].

With regards to the cycle time of the decfsz instruction, it goes as follows “one cycle if the (f - 1) is not equal to 0 and execute the next instruction, and one cycles if (f - 1) is equal to 0 and discard the instruction immediately after.” This is irrespective of whether or not d is 'W' or 'F'.

Examples:

; W = 0x00
; file = 0x05

decfsz file,F ; 1 cycle, will execute 'decfsz file,W'

; W = 0x00
; file = 0x04

decfsz file,W ; 1 cycle, will execute 'movlw 0x01'

; W = 0x03
; file = 0x04

movlw 0x01 ; 1 cycle, move the value 0x01 to Working Register
movwf file ; 1 cycle, file = 0x01

; W = 0x01
; file = 0x01

decfsz file,F ; 2 cycle, file = (file - 1) is zero, so discard next instruction
movlw 0x20 ; discarded, we ignore

; W = 0x01
; file = 0x00

goto k:
This command goes to the address k in the program memory. Although in higher level languages, the “goto” statement is looked down upon, in assembly languages, it is necessary to complete most programs. What is great about Microchip's assembler, which we will use to compile our PIC16 code, is that it allows using relative addresses for k rather than absolute addresses. For example, 0x1234 is an absolute address which refers to the address 0x1234 in the PIC16's program memory. Please do not confuse program memory with RAM. Ram is where file registers are located and they can be modified at runtime. However, program memory is the actual programming code which you upload to the microcontroller. In all cases while the program is running, program memory is read only. In contrast, relative addresses are as follows: If the character '$' is substituted for k, then it would refer to the current address of the goto instruction (an infinite loop resulting from this code). If “$+1” is substituted for k, then it will go to the code one instruction below the goto. If k is replaced with “$-5”, it will go to the code 5 instructions above the goto. In all cases, goto takes two cycles to execute.

Examples:

clrf PORTA 	; 1 cycle - clear PORTA file register.
goto $-1 	; 2 cycles - go one instruction above and continue execution.

In the above case, it would cause an infinite loop where PORTA is constantly cleared.

goto $+2 	; 2 cycles - go two instructions below and continue execution.
movlw 0x00 	; 1 cycle - place the value 0x00 in the W register.
andlw 0x01 	; 1 cycle - AND 0x01 and the contents of W register.

In the above case, movlw 0x00 is skipped over due to the goto, and it takes 3 cycles altogether.

Lets combine the the decfsz and goto instructions to create the simplest loop:

decfsz aa, F
goto $-1

We will refer to this as a one stage delay loop, but what would this do? So long as the instruction decfsz does not produce a value of zero, it will take 1 cycle to execute, and then will execute the goto statement after it which takes 2 cycles to execute. In sum, this segment of code will take 3 instructions each time decfsz does not produce zero. When it produces zero, decfsz will discard the goto instruction and will then take 2 cycles. So in short: 3 cycles if not resulting in zero, 2 cycles if resulting in zero.

Lets plug in some values for aa:

Or, in general [3(aa - 1)] + 2 cycles for this delay loop, because eventually there will be one value for aa which yields zero, while all of the rest do not. As such, if we had aa = 32, 31 of the values will take 3 cycles, and 1 of the values will take 2 cycles. We can simplify this function as follows:

[3(aa - 1)] + 2
3aa - 3 + 2
3aa - 1

What about when aa is initialized to zero?

when aa = 0, it will be decreased by one first, which will yield 255, which is a non-zero value. Effectively, initializing aa = 0, is like initializing aa = 256. Note that 256 is not a valid value for an 8 bit number, so in order to initialize aa = 256, one has to use aa = 0.

What are the limits of the delay loop?

Given that aa can effectively be set from aa = 1 to aa = 256(by way of initializing aa = 0), the minimum and maximum number of cycles which can be generated using this delay loop are found by plugging those values into the derived equation:

min: 3(1) - 1 = 2 cycles

max:3(256) - 1 = 767 cycles

Knowing this range will become important when we want to figure out how many variables we need in a delay loop. In order to simplify its presentation, the minimum and maximum number of cycles for a delay loop will be written as [2 ~767] which indicates that that using the loop with that specified range will only allow for at least 2 cycles and at most 767 cycles.

If we want to make a delay loop that includes more cycles, we would simply add another variable as follows:

decfsz aa,F
goto $-1
decfsz bb,F
goto $-3

Before we delve into what this specific code does, we will introduce new notation. cyc() will be used to refer to a delay loop cyc(aa) would mean that the delay loop only has one eight-bit variable in it (called aa), and we refer to it as a one stage loop, its output being a number of cycles. cyc(aa,bb) would mean that the loop involves two variables(called aa and bb) respectively, and we refer to it as a two stage loop. This pattern continues onward to however many variables you use. From this point onward, the following:

cyc(aa) = 3aa - 1 u [2~767]

Will be taken to mean a one stage delay loop with the formula (3aa - 1) which has one 8 bit variable called aa within it, and which can generate between 2 and 767 cycles inclusively. Note that the range for all inputs variables (such as aa) is actually [0~255] with 0 equating to 256.

With regards to what the above code is, it is a two stage loop. If time is taken to analyze the code, the following pattern will emerge:

[3(aa - 1) + 2] + { [(max of one stage loop) + 3](bb - 1) + 2}

The reason for this pattern is beyond the scope of description, but to actually derive it, it would be prudent to sit down with an empty sheet of paper, and go through a few iterations of a two stage loop. Now to actually plug in the values for the above derivation:

[3(aa - 1) + 2] + {[767 + 3](bb - 1) + 2}
[3aa - 3 + 2] + {770(bb - 1) + 2}
[3aa - 1] + {770bb - 770 + 2}
3aa - 1 + 770bb - 770 + 2
3aa + 770bb - 769

After plugging in the minimum values allowed for aa and bb as well as the maximum values, we obtain the range and our result is the following:

cyc(aa,bb) = 3aa + 770bb - 769 u [4~197119]

In addition, a general formula for any stage is found in the following:

[ previous stage cycle formula ] + [(max of previous formula) + 3][(new variable) - 1] + 2

Let us apply this to a three stage loop:

decfsz aa,F
goto $-1
decfsz bb,F
goto $-3
decfsz cc,F
goto $-5

Cycle formula:

[3aa + 770bb - 769] + [197119 + 3](cc - 1) + 2
[3aa + 770bb - 769] + [197122](cc - 1) + 2
[3aa + 770bb - 769] + [197122cc - 197122] + 2
3aa + 770bb + 197122cc -769 -197122 + 2
3aa + 770bb + 197122cc - 197889

After finding the range via plugging in allowable minimums and maximums, we obtain the formula:

cyc(aa,bb,cc) = 3aa + 770bb + 197122cc - 197889 u [6~50463231]

Thus far, our formulas are:

cyc(aa) = 3aa - 1 u [2~767]
cyc(aa,bb) = 3aa + 770bb - 769 u [4~197119]
cyc(aa,bb,cc) = 3aa + 770bb + 197122cc - 197889 u [6~50463231]

These formulas are great, so long as you copy and paste the code each time you need a delay. However, in the real world, it's more efficient and often much easier to use subroutines. A subroutine in assembly language is analogous to a function in a higher level language. Instead of typing out your code each time you need a delay, you can use subroutines. Here's how they appear in code:

; this is an excerpt from the main code section of an assembly program
call delay 		; 2 cycles, call the subroutine which we call 'delay', it's like calling a function
; the actual code for a one stage delay loop subroutine
delay:
	decfsz a,F 	; use the formula for next two lines
	goto $-1
	return 		; 2 cycle return

Or, in general:

; this is an exerpt from the main code section of an assembly program
call delay 	; 2 cycles, call the delay subroutines
; the declaration for the delay subroutine
delay:
	<code for an N-stage loop>
	return

You may notice that before, we were just using the code for an N-stage loop, however, when we turn it into a subroutine, we add on four more cycles. This applies to any number of stages. In order to adjust the stage delay loop formulas for these addition cycles, we add four to both the formula and the limits. Such that:

cyc(aa) = 3aa - 1 u [2~767]
cyc(aa,bb) = 3aa + 770bb - 769 u [4~197119]
cyc(aa,bb,cc) = 3aa + 770bb + 197122cc - 197889 u [6~50463231]

becomes:

cyc(aa) = 3aa - 1 + 4 u [2 + 4~767 + 4]
cyc(aa,bb) = 3aa + 770bb - 769 + 4 u [4 + 4~197119 + 4]
cyc(aa,bb,cc) = 3aa + 770bb + 197122cc - 197889 + 4 u [6 + 4~50463231 + 4]

and simplifies to:

cyc(aa) = 3aa + 3 u [6~771]
cyc(aa,bb) = 3aa + 770bb - 765 u [8~197123]
cyc(aa,bb,cc) = 3aa + 770bb + 197122cc - 197885 u [10~50463235]

Although the formula is becoming more and more proper, there is still one last step before our formula is complete. It involves the role of initialization a loop with specific values so as to get obtain the desired number of cycles.

It is best to initialize the loop within the routine as it makes the code less clunky and it is easier to use conditional code (such as btfss, decfsz, and incfsz). The reason being that if the loop is initialized after the call statement, code like this can be used:

btfss STATUS,Z 	; check if the previous operation yielded zero
call delay 	; delay for some amount of time

while in contrast, this would not be possible if the loop was initialized outside of the subroutine:

btfss STATUS,Z 	; check if the previous operation yields zero
clrf aa 	; clear the aa variable
call delay 	; call the delay

As you see by these two examples, only in the first example is the delay conditional. The second example is not equivalent to the first as the only part which is conditional, is the clrf aa. If the previous operation does not yield zero then do not clear aa is not equivalent to call the delay routine if the previous operation does not yield zero.

As to initializing within subroutines there are two choices, static delay loops and variable delay loops

Here are examples of how static loops are to be initialized:

; one stage delay loop subroutine
delay:
	movlw D'5' 	; 1 cycle
	movwf aa 	; 1 cycle
	<code for one stage delay loop>
	return
; two stage delay loop subroutine
delay:
	movlw D'5' 	; 1 cycle
	movwf aa 	; 1 cycle
	movlw D'23' 	; 1 cycle
	movwf bb 	; 1 cycle
	<code for two stage delay loop>
	return
; three stage delay loop subroutine
delay:
	movlw D'5' 	; 1 cycle
	movwf aa 	; 1 cycle
	movlw D'23' 	; 1 cycle
	movwf bb 	; 1 cycle
	movlw D'3' 	; 1 cycle
	movwf cc 	; 1 cycle
	<code for three stage delay loop>
	return

From here, it should be noticed that for static delay loop subroutines, that if the loop involves N variables, it will require 2N cycles to initialize a variable. What this translates to is that for a one stage loop it takes 2 cycles to initialize, for a two stage loop it takes 4 cycles to initialize, and for a three stage loop it takes 6 cycles to initialize. Similarly to the adjustments required in turning simple delay loops into a subroutines, we must also add to those formulas the following additional cycles as follows:

cyc(aa) = 3aa + 3 u [6~771]
cyc(aa,bb) = 3aa + 770bb - 765 u [8~197123]
cyc(aa,bb,cc) = 3aa + 770bb + 197122cc - 197885 u [10~50463235]

becomes:

cyc(aa) = 3aa + 3 + 2 u [6 + 2~771 + 2]
cyc(aa,bb) = 3aa + 770bb - 765 + 4 u [8 + 4~197123 + 4]
cyc(aa,bb,cc) = 3aa + 770bb + 197122cc - 197885 + 6 u [10 + 6~50463235 + 6]

and simplifies to:

cyc(aa) = 3aa + 5 u [8~773]
cyc(aa,bb) = 3aa + 770bb - 761 u [12~197127]
cyc(aa,bb,cc) = 3aa + 770bb + 197122cc - 197879 u [16~50463241]

Static delay loops are great for when you only need a constant number of delay cycles, but what about when it varies? We can use variable delay loops, and the code is as follows:

; one stage delay loop subroutine
delay:
	movf aak,W 	; 1 cycle
	movwf aa 	; 1 cycle
	<code for one stage delay loop>
	return
; two stage delay loop subroutine
delay:
	movf aak,W 	; 1 cycle
	movwf aa 	; 1 cycle
	movf bbk,W 	; 1 cycle
	movwf bb 	; 1 cycle
	<code for two stage delay loop>
	return
; three stage delay loop subroutine
delay:
	movf aak,W 	; 1 cycle
	movwf aa 	; 1 cycle
	movf bbk,W 	; 1 cycle
	movwf bb 	; 1 cycle
	movf cck,W 	; 1 cycle
	movwf cc 	; 1 cycle
	<code for three stage delay loop>
	return

Where aak, bbk, and cck, and the values you need to initialize only once. Afterward, whenever you call the delay subroutine, it initializes the loop with aa = aak, bb = bbk, cc = cck, or mnemonically aa(variable) = aa(constant). Should you desire to change the number of cycles to delay for, all that is needed to be changed are the constant values.

With regards to the number of cycles that variable delay loops take, it is exactly the same as static delay loops, thereby making the formulas (with the exception of the constant values playing a role):

cyc(aak) = 3aak + 5 u [8~773]
cyc(aak,bbk) = 3aak + 770bbk - 761 u [12~197127]
cyc(aak,bbk,cck) = 3aak + 770bbk + 197122cck - 197879 u [16~50463241]

Please be certain that you initialize variable delay loops before their initial calling.

Now we can discuss the topic of solving for the the constants given that a certain number of cycles is required:

We need 600 cycles:

cyc(aak) = 3aak + 5 = 600
3aak = 600 - 5
3aak = 595
aak = 595/3 = 198.3333

as aak can only be a whole integer, we assign 198 to it. We absolutely do not round up or down, we always truncate. What we do with the fractional part 0.3333 is multiply it by 3 to convert back to who many cycles are needed in addition to what the loop can supply, essentially, we are finding the remainder:

198.3333 - 198 = [value after division - value assigned to constant aak] = 0.3333
0.3333 x 3 = 0.9999

We round at the very last step to the nearest integer, thereby this remainder will tell us how short of our desired number of cycles we are if we use the value 198 for aak so as to obtain a cycle count of 600. Therefore, as 0.9999 rounded to the nearest integer is 1, we are short 1 cycle which indicates that the delay loop only generates 595 cycles. In practice, after the final division, before multiplying by three a 0.3333 excess indicates one cycle short, while a 0.6666 indicates that it is two cycles short. To remedy this deficiency, we recommend the following:

; this can be placed in the main code
call delay 		; 595 cycle
	nop 		; 1 cycle
; or we can add the null operation to the subroutine
delay:
	movf aak,W 	; 1 cycle
	movwf aa 	; 1 cycle
	<code for one stage delay loop>
	nop 		; 1 cycle
	return

Note that for the second solution, embedding a one cycle null operation within the delay loop will add one more cycle to this subroutine making the formula change from:

cyc(aak) = 3aak + 5 u [8~773]

to this:

cyc(aak) = 3aak + 5 + 1 u [8 + 1~773 + 1]

which simplifies to:

cyc(aak) = 3aak + 6 u [9~774]

Therefore, in case you decide to embed the nop instruction within your delay loop subroutine, make certain to modify the formula as well.

Before we continue, I will now explain the procedure for finding the remainder on a calculator:

We want to take 50002 and divide it by 35. We require both the quotient and the remainder and begin by dividing this expression in our calculators:

50002/35 = 1428.628571

This makes the quotient 1428, an now to find the remainder:

1428.628571 - 1428 = 0.628571

We take this and multiply it by the number we divided by:

0.628571 x 35 = 22

Therefore, the final answer is:

1428 remainder 22

or in the shorthand we will use (where ex is excess) throughout this document:

1428 ex 22

With this knowledge in mind, we can now solve for a three stage delay loop for 600 cycles:

cyc(aak,bbk,cck) = 3aak + 770bbk + 197122cck - 197879 u [16~50463241]

The procedure should be fairly intuitive if followed. Its rules are nearly the same as above, but remember to bring the constant (integer at the very end) to the other side, and to start with the largest coefficient in division:

cyc(aak,bbk,cck) = 3aak + 770bbk + 197122cck - 197879 = 600
3aak + 770bbk + 197122cck = 600 + 197879
3aak + 770bbk + 197122cck = 198479

solve for cck by taking the new constant, and dividing it by the coefficient in front of cck:

198479/197122 = 1 ex 1357 [as a result, cck = 1]

we find bbk by taking the remainder and dividing it by the coefficient in front of bbk:

1357/770 = 1 ex 587 [as a result, bbk = 1]

and finally, we find cck by taking this remainder and dividing it by the coefficient in front of aak:

587/3 = 195 ex 2 [as a result, aak = 195]

Because of the last remainder being two, we now know that this formula is two cycles short of 600. Lets verify:

cyc(aak,bbk,cck) = 3aak + 770bbk + 197122cck - 197879
cyc(195,1,1) = 3(195) + 770(1) + 197122(1) - 197879
cyc(195,1,1) = 585 + 770 + 197122 - 197879
cyc(195,1,1) = 585 + 770 - 757
cyc(195,1,1) = 585 + 13
cyc(195,1,1) = 598

We must also discuss the topic of picking stages. Clearly with the three formulas we have derived before, any of them will work for a 600 cycle loop, however, in practice it is better to use the loop with fewest variables where possible as it conserves memory. Consider that a one stage loop requires two bytes of ram for aa and aak, a two stage loop requires four bytes of ram for aa, aak, bb, and bbk. As you can see by the pattern a three stage loop will require six bytes of ram, and in general an N-stage loop will require 2N bytes of ram. In addition, to initialize the constants aak, bbk, and cck, it takes two cycles per each variable, so for a six stage loop, this means 6 cycles. This may be wasteful if your application barely fits into the microcontroller.

Another concern when using delay loops is offset errors. You may be tempted to use a 5000 delay loop if you need your instruction to execute every 5000 cycles, however this would create an offset error since your instruction would take at least 1 cycle, thereby making your instruction would now execute every 5001 cycles.

In order to avoid this, pad your time critical routine or set of instructions so that it always takes the same amount of cycles to process, and set the delay loop to be equal to whatever your desired value is with the number of cycles your instructions or routine takes.

Sometimes, it is important to use delay loops to create real time delays. What this means is that sometimes it is necessarily to calculate how many seconds a delay loop will take. To do this, we include the following:

The PIC16 architecture is odd in that when you use a crystal oscillator oscillating at 4.000 Mhz, the number of instructions per second with simple instructions (such as nop, movlw, bcf, etc.) is actually 1.000 MIPS (million instructions per second). Therefore, the number of MIPS can be found by taking the frequency in megahertz, and dividing it by 4, and the number of instructions per second(IPS) can be found by multiplying this answer by 1,000,000. Thus:

[ips] = [xtal frequency in Mhz] * 1,000,000 / 4

[ips] = [xtal frequency in Mhz] * 250,000

Now that we know how many instructions per second it executes, we can figure out how many seconds a number of instructions will take by dividing the number of cycles by the number of instructions per second.

Thereby:

[seconds] = [cycles] / [ips]

or:

[seconds] = [cycles] / ([xtal frequency in Mhz] * 250,000)

and since frequency is the inverse of time(period):

[frequency in Hz] = 1 / [seconds]

[frequency in Hz] = 1 / [[cycles] / ([xtal frequency in Mhz] * 250,000)]

[frequency in Hz] = [xtal frequency in Mhz] * 250,000 / [cycles]

Last Revision: 2007-05-14