> > C is required to promote all operands to either "int" or "unsigned
> > int" before performing calculations if doing so will affect the result; C
> ^^^^^^^^^^^^^^^^^^^^^
> > (per both K&R and ANSI) mandates than "int" means a type whose range is
> at
> > least -32767 to 32767 and "unsigned int" is a type whose range is at
> least
> > 0 to 65535.

> Its in a variable - at compile - time it doesn't know if it will affect
> the result.

If the compiler can't determine whether the promition is necessary, the
promotion is required.  On the other hand, it's often fairly easy to
determine most of the cases where promotions can be eliminated.  I have
the beginnings of a PIC compiler I was working on which in fact does just
that.  The algorithm I used (not necessarily the best, but it works quite
adequately) is to build each expression into a tree with each node labeled
according to its maximum possible integer size (for example, a multiply
node's maximum size is the sum of its two decentants' sizes, up to a
maximum of 4).  Then it goes through on a second pass and evaluates each
node's size and that of its children; if a child node's integer size is
excessive given its parent's size, it will be downsized (for example, if
the result of a multiply will be placed in a two-byte integer, the
operands will be shrunk if they were larger than two bytes).

There are a couple of tricky cases: most notably addition and subtraction.
The result of an add or subtract will be the size of the larger operand,
plus 1, EXCEPT that this "plus one" side won't propagate through
additional additions [e.g. in

{
  uint16 x,y,z;
  uint32 l;

  l=x+y+z;
}

the addition of x+y will be extended to 3 bytes.  The addition of that
3-byte quantity to the two-byte "z", however, will still be three bytes.]
Signed math also poses its own problems, and the code has no way of
knowing how big numbers will really get.  For example, in the code:

{
  uint8 x,y,z;
  uint32 l;

  l=(x+y)*z;
}

the compiler has no way of knowing whether it's actually necessary to
calculate the multiplication as 16x8->24, 16x8->16, or 8x8->16;
consequently, it will calculate it as the latter.  This is especially
problematical in cases such as:

{
  uint8 t,u,v,w, y,z;

  z=(t+u)*(v+w) >> y;
}

In this case, the compiler has no choice but to process the multiplication
as 16x16->32 even though it's unlikely that the full 32 bits will be
needed.  On the other hand, such cases aren't terribly common and it's
probably better for the compiler to ensure that all results that fit
within the -2147483648..2147483647 range will be handled correctly [even
if slowly] than to produce code which will be faster but may produce
incorrect results.

BTW, there is no support for unsigned 32 bit integers (though signed 32
and unsigned 24 are supported).  This restriction was necessary to ensure
that the type promotion rules behave reasonably; otherwise, it's unclear
what should happen to an expression such as

{
  uint16 x,y,z;
  int24 w;  /* signed */

  w = (x*y-z) / 512;
}

The (x*y-z) expression result may be anything from 4294836225 to -65535.
Any of those results would fit within "w" after the division, but the
compiler has no way of knowing whether the inner expression should be
promoted to an unsigned int32 (in which case results less than zero would
be miscomputed) or a signed int32 (in which case results over 2147483647
would be miscomputed).  Mandating that it will be a signed int32 replaces
this ambiguity with a simple rule: if all intermediate results are within
the proper range, the calculations will be done correctly.

> > Much as I like the CCS compiler, its handling of multiplies is IMHO quite
> > broken.  An 8x8->16 multiply is not significantly slower than 8x8->8, but
> > unless you write your on multiply (as I usually do) the only way to get a
> > 16 bit result from the multiply is to do a 16x16 multiply (MUCH slower!)

> Much faster!  I use that TWOBYTES trick to split WORDS (long) into BYTES,
> without the need to do two divides (well, one is a modulus op, but it calls
> the same 16bit divide routine.)  I haven't actually looked at CCS's mult
> /divide routines carefully, you really think they're bad? (as in slow, not
> necessarily inaccurate)

I don't think the CCS's routines are terrible or anything (they're decent,
but not exceptional).  On the other hand, a 16x16->16 multiply has to do
about 3-4 times as much work as an 8x8->16 multiply.  It's therefore
unavoidable that it will be slower than an 8x8->16.