Hello Scott. > If you add two horizontal registers together, > only two instructions are needed: > > movf r1,w > addwf r2,f > > If you're implementing phase accumulators, you > need the msb of the sum and: > movf r1,w > addwf r2,f > rlf r2,w > rlf phase,f > > If you're implementing multibyte adders (e.g. 16 bits) you need 6 > instructions for the addition. If you're using this as phase accumulator > then two more instructions are needed to extract the msb. The number of > instructions required for say 8 counters is: > (6 + 2) * 8 = 80 (6 + 2) * 8 = 64 ? Where others cycles were lost ? > With Dmitry's 6-cycle per stage vertical adder (actually the first stage > only needs to be two cycles) you need: > 6 * (stages -1 ) + 2 instructions > And for 16 bits that comes out to 92 instructions. For 14 bits, the > horizontal and vertical counters are equivalent and for fewer than 14 (but > more than 8) the vertical is faster! Actually I've understood another interesting thing that makes a vertical counters faster and memory requireless in competition with horizontal ones. Phase accumulators usually used to generate sine & cosine square waves. Let us recall the following trick : X= 00, 01, 10, 11 X.1 changed as sine func ( 0_0_1_1_.. ) and X.1 xored with X.0 changed as cosine ( 0_1_1_0_..) Generating sine and cosine will require only one additional stage of vertical conter. In case of 7 bits counters we will see the following: movlw const_1 ;phase adding addwf count_1s,f addwf count_1c,f rlf count_1s,w rlf phase_l,f rlf count_1c,w rlf phase_l,f 7 clocks per sin&cos generating operation 7 * 8 = 56 and will require 2*8 + (1 or 2) (temp_phase) cells of memory In case of vertical implementing: ( 8 + 1 stages at all ) 6 * 8 + 2 + 1(additional xorwf to obtain cos) = 51 and will require only 9 cells of memory Probably there are way to achieve much better performance after understanding what John'd proposed in 3 clocks routine. WBR Dmitry. PS. Playing RISC uC obtain pleasure ;)