> P.S. In my thinking about this problem I stumbled upon the following: > > delay64 call delay32 > delay32 call delay16 > delay16 call delay8 > delay8 call delay4 > delay4 return For reasonably short delays, I prefer to use: [though normally I only expand it to Delay1024] Length is 36 bytes, but it uses no F-registers and calling it even at the 128K level is usually manageable on the stack (6 locations). When called for more typical delay values (e.g. 2048) the stack usage is often well within what's available. Delay131072:: call Delay16384 DelayXXX:: call Delay16384 DelayXXY:: call Delay16384 DelayXYX:: call Delay16384 Delay65536:: call Delay16384 Delay49152:: call Delay16384 Delay32768:: call Delay16384 Delay16384:: call Delay2048 Delay14336:: call Delay2048 Delay12288:: call Delay2048 Delay10240:: call Delay2048 Delay8192:: call Delay2048 Delay6144:: call Delay2048 Delay4096:: call Delay2048 Delay2048:: call Delay256 Delay1792:: call Delay256 Delay1536:: call Delay256 Delay1280:: call Delay256 Delay1024:: call Delay256 Delay768:: call Delay256 Delay512:: call Delay256 Delay256:: call Delay32 Delay224:: call Delay32 Delay192:: call Delay32 Delay160:: call Delay32 Delay128:: call Delay32 Delay96:: call Delay32 Delay64:: call Delay32 Delay48:: call Delay32 Delay32:: call Delay4 Delay28:: call Delay4 Delay24:: call Delay4 Delay20:: call Delay4 Delay16:: call Delay4 Delay12:: call Delay4 Delay8:: call Delay4 Delay4:: return