SX Microcontroller Math Method

Multiply 8 x 8 bits

Scott says:

The other day I stated that there is a faster Multiplication algorithm than the one posted by Martin, which is a variation of the one on James' Web page, which in turn is variation of the one found in three different places of the ECH. (Martin's routine is one cycle shorter, however if you allow the other routines to pass one of the multiplicands in W then they'd be one cycle shorter too.) The algorithm to which I referred is also found in the ECH - at least I thought it was. I was unable to find it. (Perhaps I was dreaming...) But it goes something like this:
If the first bit tested in the shift-and-add multiplication algorithm is zero, then there's no need to perform the shift-and-add operation for the first iteration. If the next bit is zero too, you can skip that one as well. The first non-zero bit encountered doesn't need to be added, but it does need to be shifted.
Here's an example of the algorithm. However, I'm not sure if this is the optimum. It has a worst case execution time of 36 cycles excluding the return and call and has a best case excution time of 21 cycles and an average right around 34 cycles. So on average it saves one cycle over the other inline multiplication functions, but it takes 50% more code to do so. Unless I was desparate to save one cycle, I'd probably stick with the other versions.

;
; Multiply x*y and produce a 16bit result. The high byte of the
;result is aliased with x.
;

multiply
	mov	W, x	;; or save a cycle by letting the caller init. ;)

	clrb	C
	clr	res_lo

	snb	y.0
	jmp	l0

	snb	y.1
	jmp	l1

	snb	y.2
	jmp	l2

	snb	y.3
	jmp	l3

	snb	y.4
	jmp	l4

	snb	y.5
	jmp	l5

	snb	y.6
	jmp	l6

	snb	y.7
	jmp	l7

	clr	x	;Dmitry Kiryashov says: otherwise y==0 but x isn't

	jmp	l8	;; or return


l0	rr	x
	rr	res_lo

	snb	y.1
	add	W, x

l1	rr	x
	rr	res_lo

	snb	y.2
	add	W, x

l2	rr	x
	rr	res_lo

	snb	y.3
	add	W, x

l3	rr	x
	rr	res_lo

	snb	y.4
	add	W, x

l4	rr	x
	rr	res_lo

	snb	y.5
	add	W, x

l5	rr	x
	rr	res_lo

	snb	y.6
	add	W, x

l6	rr	x
	rr	res_lo

	snb	y.7
	add	W, x

l7	rr	x
	rr	res_lo


l8	ret