James Cameron says:

These PICs are just too fast, we often need a way to have the processor wait around for a while. A series of NOP (no operation) instructions is straightforward, but wasteful of instruction memory. A counted loop is the next common trick, because that lets us tune it.

But what are some of the more exotic ways to delay?

  1. ) The Common NOP, a delay for one instruction cycle

    DELAY	nop	; delay one cycle
    

    Advantages: extremely simple, obvious, maintainable, scaleable.
    Disadvantages: is it really there for a reason?

  2. ) The Common Loop, three times the initial value plus three

    DELAY	mov	W, #95	; 288 cycle delay
    	mov	COUNTER, W
    	decsz	COUNTER
    	jmp	$-1
    

    Advantages: fairly simple, maintainable, scaleable up to 771 cycles; after that go for a nested loop.
    Disadvantages: costs one file register; though there is a variant using just the W register.

  3. ) The Novel GOTO, two instruction cycles in one instruction

    	jmp	$+1	; two cycle delay
    

    Advantages: half the space of two NOPs.
    Disadvantages: obscure unless commented.

  4. ) The call to Nowhere

    	org	0
    	jmp	MAIN
    FOUR	ret	; four cycle delay function
                    [...]
    DELAY	call	FOUR
    

    Advantages: quarter the space of four NOPs, the RETURN can be reused by other code, good use for those three bytes between the reset vector and the interrupt vector on a PIC 16F84.
    Disadvantages: implementation separate from use, can look odd, uses one stack level.

    Scott Dattalo says:

    ...if you want a 4-cycle single-instruction delay: call some_return_in_your_code Of course, you run the risk of stack overflow on [some processors]
  5. ) The Double Call to Nowhere, eight or four cycles

    	org	0
    	jmp	MAIN
    EIGHT	call	FOUR
    FOUR	ret
                    [...]
    DELAY	call	EIGHT
    

    Advantages: looks simple, allows various size delays to be rapidly constructed during prototyping.
    Disadvantages: uses two stack levels.

    [ed: See the "Stack Recursive Delay" below]

  6. ) The Do Something Useful Extension to the Common Loop, five times the initial value plus three

    DELAY	mov	W, #95
    	mov	COUNTER, W
    
    ;*** WARNING: TRIS registers are accessed by MOV !Rx, W (M = $0F or $1F). 
    ;                MOVF    TRISM           ; fetch TRIS mirror
    	test	TRISM	; fetch TRIS mirror
    ;*** WARNING: TRIS expanded in two instructions. Check if previous instruction is a skip instruction.TRIS registers are accessed by MOV !Rx, W (M = $0F or $1F). 
    ;                TRIS    TRISB           ; reapply it
    	; reapply it
    	decsz	COUNTER
    	jmp	$-3
    

    Advantages: good for precise delays that are no multiples of three, allows useful functionality to be placed within the delay.
    Disadvantages: increased convolution of code, lower maintainability.

  7. ) The Long Delay Using Timer, more of a technique than a code fragment, set the timer, wait for it to roll over.

    Advantages: immune to distortion by interrupts, easily scaled using a prescaler, even possible to tune the delay by modifying the preload value.
    Disadvantages: allocates a timer.

    Paul B. Webster says:

    I feel that many applications (real-time control, clocks) resolve to a fundamental "tick" or hierarchy thereof, often around a millisecond, which method {7} provides. Counting a thousand of these gives a second, at which point a train of countdowns (semaphores) can lead to various housekeeping actions i.e., if T1 then { T1--; if T1 == 0 then action1 };

    On the 1 ms "ticks" also, a debounce uses a counter which counts down from 20 (20 ms) to verify a keypress/ release. Other countdowns in ms are used for playing tunes or "tick", "Click" or "blip" noises.

    It is highly undesirable to pre-load or fiddle with the TMR0 when you are using it this way, firstly because it interferes with the prescaler in a very inconvenient fashion, and secondly as the 1 ms countdowns can be used for a 500 Hz tone while the TMR0 MSB can be copied to a port for a 1 kHz tone, bit 6 for a 2 kHz tone, etc., controlled as above in even numbers of milliseconds.

    You can similarly, wait on those individual bits by polling, for sub- delays. A sub-delay on bit 3 toggling could be used to time a phase accumulator (32 kHz clock) to generate quite a complete range of square wave tones.

    So, you may say that method "uses" a timer-counter, but I submit that in a well-designed application, you get an awful lot of "use" out of it!

  8. ) The Watchdog Delay, go to sleep and get woken by the watchdog.

    Advantages: extremely simple technique, can be varied by changing the WDT prescaler ratio.
    Disadvantages: difficult to calibrate.

    [ed: Set up a type of state machine that records where the processor should contine executing code after the reset. Make sure you clear it in your non-timed routines. Also see Using the watchdog timer to sense temperature]

  9. ) The Data EEPROM Delay, a typically 10ms delay that can be triggered by writing to data EEPROM and waiting for the interrupt.

    Advantages: tests the endurance of the EEPROM.
    Disadvantages: tests the endurance of the EEPROM.

Tony Nixon says:

Here's a a simple example [for the 16F84 at 4Mhz], but it won't be accurate to the second, so you may have to tweak it. Perhaps you could use it as a basis for your code.
	mov	W, #x	; x = hours delay
	mov	hours, W
	call	Hours_Delay

        ; rest of code continues

Hours_Delay
	call	Hour_Delay
	decsz	hours
	jmp	Hours_Delay
	ret

Hour_Delay
	mov	W, #60
	mov	mins, W
Rst_Loop
	mov	W, #60
	mov	secs, W
Hour_Loop
	call	Second_Delay
        ;
        ; maybe some processing in here
        ;
	decsz	secs
	jmp	Hour_Loop
	decsz	mins
	jmp	Rst_Loop
	ret

Second_Delay
	mov	W, #01h
	mov	NbHi, W
	mov	W, #06h
	mov	NbLo, W
	mov	W, #13h
	mov	NaHi, W
	mov	W, #$B5
	mov	NaLo, W
DeLoop0
	decsz	NaLo
	jmp	DeLoop0
	decsz	NaHi
	jmp	DeLoop0
	decsz	NbLo
	jmp	DeLoop0
	decsz	NbHi
	jmp	DeLoop0
	ret


Stack Recursive Delay

The name of the lable you call indicates the number of cycles (Convert cycles to time) that will be executed before returning. This uses an amazingly small number of bytes and is probably more effecient than inline loops or nop's if you are going a lot of little delays of varying times through out your code.


Delay131072	call	Delay16384	; uses 6 stack levels
Delay114688	call	Delay16384
Delay98304	call	Delay16384
Delay81920	call	Delay16384
Delay65536	call	Delay16384
Delay49152	call	Delay16384
Delay32768	call	Delay16384
Delay16384	call	Delay2048	;uses 5 stack levels
Delay14336	call	Delay2048
Delay12288	call	Delay2048
Delay10240	call	Delay2048
Delay8192	call	Delay2048
Delay6144	call	Delay2048
Delay4096	call	Delay2048
Delay2048	call	Delay256	;uses 4 stack levels
Delay1792	call	Delay256
Delay1536	call	Delay256
Delay1280	call	Delay256
Delay1024	call	Delay256
Delay768	call	Delay256
Delay512	call	Delay256
Delay256	call	Delay32	;uses 3 stack levels
Delay224	call	Delay32
Delay192	call	Delay32
Delay160	call	Delay32
Delay128	call	Delay32
Delay96	call	Delay32
Delay64	call	Delay32
Delay48	call	Delay32
Delay32	call	Delay4	;uses 2 stack levels
Delay28	call	Delay4
Delay24	call	Delay4
Delay20	call	Delay4
Delay16	call	Delay4
Delay12	call	Delay4
Delay8	call	Delay4
Delay4	ret	;uses 1 stack level

Macro Example

James Newton says:

I also wrote this god-awful macro for the Parallax SX key asembler that may be of use to someone: It produces the smallest and/or tightest loop delays inline when you call it with the value and units of time you wish to delay for; calculating the number of cycles based on the processor speed. The CpuMhz equate must be adjusted to what ever is right for your chip. It should probably be called CpuMips vice CpuMhz. The only redeming virtue is that it does not use anything other than w (on the SX you can decrement W); no stack use, no register use. :
	device	pins28, pages1, banks8, turbo, stackx, optionx, carryx
	reset reset_entry
CpuMhz = 50
temp	ds	1
usec	equ	-6
msec	equ	-3
sec	equ	1
cycles	equ	0

mynop	MACRO
	noexpand
	page $>>8
	ENDM

cyclefor MACRO 1
_cycles = \1
IF _cycles > 0
_temp	=	$//4
IF _temp = 2
 IF _cycles < 5
  REPT _cycles
   expand
   mynop
   noexpand
   ENDR
  _cycles = 0
 ELSE
  expand
  mynop
  noexpand
  _cycles = _cycles -1
  ENDIF
 ENDIF
IF _temp = 1
 IF _cycles < 6
  REPT _cycles
   expand
   mynop
   noexpand
   ENDR
  _cycles = 0
 ELSE
  expand
  mynop
  mynop
  noexpand
  _cycles = _cycles -2
  ENDIF
 ENDIF
IF _cycles > 3
	expand
	 mov w, #_cycles / 3
	 decsz 1	;dec w
	 clrb 2.1       ;modify PC to jump back
	noexpand
 _cycles = _cycles // 3 ;cycles left over
 ENDIF
IF _cycles > 0
  REPT	_cycles
	expand
	 mynop
	noexpand
   ENDR
 ENDIF
ENDIF
	ENDM

reset_entry
	mov w,#$7F
	mov !OPTION,w

delayhelp MACRO
	ERROR 'USAGE: delay value, [usec,msec,sec,cycles]'
	ENDM

delay	MACRO	2
noexpand
IF (\2=usec OR \2=msec OR \2=sec) AND (\1<1000 AND \1>0)
 IF \2=sec
  _cycles = (\1 * 1000000 / (1000/CpuMhz))
  ENDIF
 IF \2=msec
  _cycles = (\1 * 1000 / (1000/CpuMhz))
  ENDIF
 IF \2=usec
  _cycles = (\1 / (1000/CpuMhz))
  ENDIF
 IF \2=cycles
  _cycles = \2
  ENDIF
 IF _cycles = 0
  expand
	 ;delay less than one cycle at this processor speed'
  noexpand
 ELSE
  IF _cycles > 255
   REPT (_cycles / 256)
    cyclefor 256
    _cycles = _cycles - 256
    ENDR
   ENDIF
  cyclefor _cycles
  ENDIF

ELSE
	delayhelp
ENDIF
	ENDM

	delay 999, usec
	delay 200, usec
	delay 250, usec
	delay 10, usec
	delay 20, usec
	delay 100, msec