>     If your coefficients are constants, you can write inline code
>     that's MUCH faster (and shorter) than a generic two-variable
>     multiply, even if the constant isn't a power of two... Start
>     with Microchip's unrolled code, then delete all the BTFSCs.
>     Next, remove all the ADDWFs except those which correspond to "1"
>     bits in the constant multiplier.

You can often do even better than this by using "Booth's" algorithm; it's
not worth the effort if the coefficient isn't constant, but if it is
constant all the extra decision-making can be done by the programmer
before the program is compiled/run.

Although 164 doesn't benefit from Booth's algorithm, other constants do.
The trick is to rewrite strings of 3 or more "1"'s as a higher "1" and a
"minus one".  For example, to multiply by 159 ($9F) rather than regarding
that number as 128+16+8+4+2+1, it's much faster to regard it as 128+32-1;
then instead of six "add"s you end up with two "add"s and a subtract.  I'm
not quite sure what carry ends up doing; you need to be a little careful
about that.  Nonetheless, it is for some cases a very useful optimization
to keep in mind.