James Cameron says:

These PICs are just too fast, we often need a way to have the processor wait around for a while. A series of NOP (no operation) instructions is straightforward, but wasteful of instruction memory. A counted loop is the next common trick, because that lets us tune it.

But what are some of the more exotic ways to delay?

  1. ) The Common NOP, a delay for one instruction cycle

            DELAY   NOP                     ; delay one cycle
    

    Advantages: extremely simple, obvious, maintainable, scaleable.
    Disadvantages: is it really there for a reason?

  2. ) The Common Loop, three times the initial value plus three

            DELAY   MOVLW   D'95'           ; 288 cycle delay
                    MOVWF   COUNTER
                    DECFSZ  COUNTER,F
                    GOTO    $-1
    

    Advantages: fairly simple, maintainable, scaleable up to 771 cycles; after that go for a nested loop.
    Disadvantages: costs one file register; though there is a variant using just the W register.

  3. ) The Novel GOTO, two instruction cycles in one instruction

                    GOTO   $+1              ; two cycle delay
    

    Advantages: half the space of two NOPs.
    Disadvantages: obscure unless commented.

  4. ) The CALL to Nowhere, four instruction cycles

                    ORG     0
                    GOTO    MAIN
            FOUR    RETURN                  ; four cycle delay function
                    [...]
            DELAY   CALL    FOUR
    

    Advantages: quarter the space of four NOPs, the RETURN can be reused by other code, good use for those three bytes between the reset vector and the interrupt vector on a PIC 16F84.
    Disadvantages: implementation separate from use, can look odd, uses one stack level.

    Scott Dattalo says:

    ...if you want a 4-cycle single-instruction delay: call some_return_in_your_code Of course, you run the risk of stack overflow on [some processors]
  5. ) The Double Call to Nowhere, eight or four cycles

                    ORG     0
                    GOTO    MAIN
            EIGHT   CALL    FOUR
            FOUR    RETURN
                    [...]
            DELAY   CALL    EIGHT
    

    Advantages: looks simple, allows various size delays to be rapidly constructed during prototyping.
    Disadvantages: uses two stack levels.

    [ed: See the "Stack Recursive Delay" below]

  6. ) The Do Something Useful Extension to the Common Loop, five times the initial value plus three

            DELAY   MOVLW   D'95'
                    MOVWF   COUNTER
    
                    MOVF    TRISM           ; fetch TRIS mirror
                    TRIS    TRISB           ; reapply it
                    DECFSZ  COUNTER,F
                    GOTO    $-3
    

    Advantages: good for precise delays that are no multiples of three, allows useful functionality to be placed within the delay.
    Disadvantages: increased convolution of code, lower maintainability.

  7. ) The Long Delay Using Timer, more of a technique than a code fragment, set the timer, wait for it to roll over.

    Advantages: immune to distortion by interrupts, easily scaled using a prescaler, even possible to tune the delay by modifying the preload value.
    Disadvantages: allocates a timer.

    Paul B. Webster says:

    I feel that many applications (real-time control, clocks) resolve to a fundamental "tick" or hierarchy thereof, often around a millisecond, which method {7} provides. Counting a thousand of these gives a second, at which point a train of countdowns (semaphores) can lead to various housekeeping actions i.e., if T1 then { T1--; if T1 == 0 then action1 };

    On the 1 ms "ticks" also, a debounce uses a counter which counts down from 20 (20 ms) to verify a keypress/ release. Other countdowns in ms are used for playing tunes or "tick", "Click" or "blip" noises.

    It is highly undesirable to pre-load or fiddle with the TMR0 when you are using it this way, firstly because it interferes with the prescaler in a very inconvenient fashion, and secondly as the 1 ms countdowns can be used for a 500 Hz tone while the TMR0 MSB can be copied to a port for a 1 kHz tone, bit 6 for a 2 kHz tone, etc., controlled as above in even numbers of milliseconds.

    You can similarly, wait on those individual bits by polling, for sub- delays. A sub-delay on bit 3 toggling could be used to time a phase accumulator (32 kHz clock) to generate quite a complete range of square wave tones.

    So, you may say that method "uses" a timer-counter, but I submit that in a well-designed application, you get an awful lot of "use" out of it!

  8. ) The Watchdog Delay, go to sleep and get woken by the watchdog.

    Advantages: extremely simple technique, can be varied by changing the WDT prescaler ratio.
    Disadvantages: difficult to calibrate.

    [ed: Set up a type of state machine that records where the processor should contine executing code after the reset. Make sure you clear it in your non-timed routines. Also see Using the watchdog timer to sense temperature]

  9. ) The Data EEPROM Delay, a typically 10ms delay that can be triggered by writing to data EEPROM and waiting for the interrupt.

    Advantages: tests the endurance of the EEPROM.
    Disadvantages: tests the endurance of the EEPROM.

Tony Nixon says:

Here's a a simple example [for the 16F84 at 4Mhz], but it won't be accurate to the second, so you may have to tweak it. Perhaps you could use it as a basis for your code.
        movlw x   ; x = hours delay
        movwf hours
        call Hours_Delay

        ; rest of code continues

Hours_Delay
        call Hour_Delay
        decfsz hours
        goto Hours_Delay
        return

Hour_Delay
        movlw d'60'
        movwf mins
Rst_Loop
        movlw d'60'
        movwf secs
Hour_Loop
        call Second_Delay
        ;
        ; maybe some processing in here
        ;
        decfsz secs
        goto Hour_Loop
        decfsz mins
        goto Rst_Loop
        return

Second_Delay
        movlw 01h
        movwf NbHi
        movlw 06h
        movwf NbLo
        movlw 13h
        movwf NaHi
        movlw 0xB5
        movwf NaLo
DeLoop0
        decfsz NaLo
        goto DeLoop0
        decfsz NaHi
        goto DeLoop0
        decfsz NbLo
        goto DeLoop0
        decfsz NbHi
        goto DeLoop0
	return


Stack Recursive Delay

The name of the lable you call indicates the number of cycles (Convert cycles to time) that will be executed before returning. This uses an amazingly small number of bytes and is probably more effecient than inline loops or nop's if you are going a lot of little delays of varying times through out your code.


Delay131072     call Delay16384	; uses 6 stack levels
Delay114688     call Delay16384
Delay98304      call Delay16384
Delay81920      call Delay16384
Delay65536      call Delay16384
Delay49152      call Delay16384
Delay32768      call Delay16384
Delay16384      call Delay2048	;uses 5 stack levels
Delay14336      call Delay2048
Delay12288      call Delay2048
Delay10240      call Delay2048
Delay8192       call Delay2048
Delay6144       call Delay2048
Delay4096       call Delay2048
Delay2048       call Delay256	;uses 4 stack levels
Delay1792       call Delay256
Delay1536       call Delay256
Delay1280       call Delay256
Delay1024       call Delay256
Delay768        call Delay256
Delay512        call Delay256
Delay256        call Delay32	;uses 3 stack levels
Delay224        call Delay32
Delay192        call Delay32
Delay160        call Delay32
Delay128        call Delay32
Delay96         call Delay32
Delay64         call Delay32
Delay48         call Delay32
Delay32         call Delay4	;uses 2 stack levels
Delay28         call Delay4
Delay24         call Delay4
Delay20         call Delay4
Delay16         call Delay4
Delay12         call Delay4
Delay8          call Delay4
Delay4          return		;uses 1 stack level

Macro Example

James Newton says:

I also wrote this god-awful macro for the Parallax SX key asembler that may be of use to someone: It produces the smallest and/or tightest loop delays inline when you call it with the value and units of time you wish to delay for; calculating the number of cycles based on the processor speed. The CpuMhz equate must be adjusted to what ever is right for your chip. It should probably be called CpuMips vice CpuMhz. The only redeming virtue is that it does not use anything other than w (on the SX you can decrement W); no stack use, no register use. :
	device	pins28, pages1, banks8, turbo, stackx, optionx, carryx
	reset reset_entry
CpuMhz = 50
temp	ds	1
usec	EQU	-6
msec	EQU	-3
sec	EQU	1
cycles	EQU	0

mynop	MACRO
	noexpand
	page $>>8
	ENDM

cyclefor MACRO 1
_cycles = \1
IF _cycles > 0
_temp	=	$//4
IF _temp = 2
 IF _cycles < 5 
  REPT _cycles 
   expand 
   mynop 
   noexpand 
   ENDR 
  _cycles = 0 
 ELSE  
  expand 
  mynop 
  noexpand 
  _cycles = _cycles -1 
  ENDIF 
 ENDIF 
IF _temp = 1 
 IF _cycles < 6 
  REPT _cycles 
   expand 
   mynop 
   noexpand 
   ENDR 
  _cycles = 0 
 ELSE 
  expand 
  mynop 
  mynop 
  noexpand 
  _cycles = _cycles -2 
  ENDIF 
 ENDIF 
IF _cycles > 3
	expand
	 mov w, #_cycles / 3
	 decsz 1	;dec w
	 clrb 2.1       ;modify PC to jump back
	noexpand
 _cycles = _cycles // 3 ;cycles left over
 ENDIF
IF _cycles > 0
  REPT	_cycles
	expand
	 mynop
	noexpand
   ENDR
 ENDIF
ENDIF
	ENDM

reset_entry
	mov w,#$7F
	mov !OPTION,w

delayhelp MACRO
	ERROR 'USAGE: delay value, [usec,msec,sec,cycles]'
	ENDM

delay	MACRO	2
noexpand
IF (\2=usec OR \2=msec OR \2=sec) AND (\1<1000 AND \1>0)
 IF \2=sec
  _cycles = (\1 * 1000000 / (1000/CpuMhz))
  ENDIF
 IF \2=msec
  _cycles = (\1 * 1000 / (1000/CpuMhz))
  ENDIF
 IF \2=usec
  _cycles = (\1 / (1000/CpuMhz))
  ENDIF
 IF \2=cycles
  _cycles = \2
  ENDIF
 IF _cycles = 0
  expand
	 ;delay less than one cycle at this processor speed'
  noexpand
 ELSE
  IF _cycles > 255
   REPT (_cycles / 256)
    cyclefor 256
    _cycles = _cycles - 256
    ENDR
   ENDIF
  cyclefor _cycles
  ENDIF

ELSE
	delayhelp
ENDIF
	ENDM

	delay 999, usec
	delay 200, usec
	delay 250, usec
	delay 10, usec
	delay 20, usec
	delay 100, msec