On Sat, 3 May 2014, Byron Jeff wrote:

> On Sat, May 03, 2014 at 04:46:23PM +0100, smplx wrote:
>>
>>
>> On Sat, 3 May 2014, Byron Jeff wrote:
>>
>>> On Fri, May 02, 2014 at 11:06:10PM -0600, PICLIST wrote:
>>>> Further to my previous comment:
>>>>
>>>> It all makes sense when you think about it carefully.
>>
>> No it doesn't. We've just learned to accept that that is the way it is :=
-)
>>
>> Newbies that fall into the trap are the ones that are doing things
>> correctly. They learned how to do simple arithmetic with numbers before
>> they learned to program and now because C has had a standard retrofitted
>> we try to justify why they (the newbies) are wrong.
>
> I think that's a bit harsh. Compiler writers have to deal with the
> limitations of the machines the code will run on.

<friendly debate mode>

If you look at it like that I can see how you would feel it is harsh. But
as a programmer I get upset when other programmers take short cuts and
produce "bad" code. As a compiler writer I have the right to get upset
when I see other compiler writers taking short cuts because it is easier
for them.

If you really justify limited compilers by the limited hardware then
surely you should not expect an 8 bit PIC16 to support 16 or 32 bit
integers, let alone floating point math. Heck you shouldn't even be
using C as there is no hardware stack to pass arguments on.

>
> In pure mathematics, integers are infinite unbounded sets. They cannot be
> so on a piece of hardware. Therefore there are some limitations.

In pure mathematics infinity is a process (actually one of many) but here
we are actually dealing with two well defined small sets the properties of
which allow us to tell if a number in one set is smaller, larger or equal
to a number in the other set. Infinities don't come into it,
representations don't come into it (as they don't if you are comparing an
integer to a float). The only thing that really matters here is the fact
that we can compare the value and tell how they relate to each other. We
are not asking for the sum or difference of these two numbers and trying
to cram the result into an insufficiently small space. We just want to
know the order and we can do that very easily with the information
available.

>
> Newbies will run into the same problem if they attempt to put a 46 bit
> value into a 32 bit int. The result makes no sense mathematically. Howeve=
r
> from the perspective of the limited machine the operation is being done o=
n,
> it does make sense.

No we are not comparing like for like now. The point is that all the
information required to do a signed / unsigned comparison ***IS THERE***.
Unlike your 46 bit / 32 bit analogy where the information is being lost.

simple proof:

        // where int and unsigned int are both 16 bit values
        // and longs are signed 32 bit values
        int a;
        unsigned int b;

        if ((long)a < (long)b) { }

works as a mathmatician would expect.

>
> And my real pushback is the fact that there is this prevailing notion tha=
t
> it's always best to make things simple for newbies at all cost. This is
> often a mistake because newbies who are regular users of a tool will not
> remain newbies to that tool. And precisely the same gymnasitics that woul=
d
> be necessary to align the outcome from the compiler to match a newbies
> expectations will become the exact same performance hindrance that will
> annoy the intermediate or expert user.

No, I didn't mean we should be making things easier for newbies. What I
was trying to show is that their expectations are unbiased by the language
they have started to use whereas we have grown so used to it that we just
accept it.

There would be no performance penalties in the hands of an experienced
programmer who knows to use signed with signed and unsigned with unsigned.
But it would certainly remove the occational subtle bug that creeps in
even in the hands of the most experienced programmer.

>>>
>>> It does. And the real takeaway is that when working with incompatible t=
ypes
>>> you have to give the compiler hints as to the correct direction to go.
>>
>> This is what really really irritates me - they are not incompatible!!!
>
> Of course they are because of the physical limitations of the memory used
> to hold them. signed and unsigned integers of exactly the same size canno=
t
> hold the same numbers. There are no negative numbers in an unsigned int. =
So
> they are incompatible.

No they are not incompatible. A 16 bit (2's complement) signed variable
can hold a number in the range -32768 to +32767 whereas an unsigned 16 bit
variable can hold a number between 0 and 65536. It is trivial to calculate
when one variable holds a value less than, equal to or greater than the
other so why are they incompatible from a comparison point of view.

You seem to be arguing that "once the value has been loaded into a 16 bit
variable that it loses all meaning and just becomes a collection of bits.
So that comparing one collection of bits with another is not possible if
the collections are different". But the meaning has not been lost, the
compiler knows what is in each variable. Why is it that the compiler can
keep track of comparing a "long" (a 32 bit collection of bits) and a
"float" (another 32 bit collection of bits) and generate the correct code
to compare them?

>
> What probably had been best for a newbie is to force an explicit cast. Th=
en
> it would be a bit more obvious that the programmer has to make a decision
> about which part of the range is important. But again all it would do is
> annoy more advanced practitioners who either would be perfectly happy wit=
h the
> compiler's handling of the situation, or would throw in the explicit cast
> in order to direct the correct action for the situation.

But this isn't just about forcing newbies or advanced practitioners to do
something. It's about the fact that the information is actually available
to do something sensible with.

And anyway, don't get me started on the horrors of explicit casting :-)

>
>>
>> If they were then the compiler should refuse to add them together. It
>> should force you to decide exactly how they should be treated.
>
> See above. C was always designed as a language for expert practitioners.

Designed? Hmm :-)

Having used C since 1984, I think what you meant was the programmer had to
provide some of the smarts to get the compiler to generate code that
actually worked. I remember when you could put an int as an actual
argument where the formal had been declared as a float and spend all day
trying to figure out that you were missing a cast :-)

> The coder is supposed to have an understanding of the standards and the
> reasons why the choices that were made were done so.

Oh come on, when has this stopped you looking at a program written in a
language you are not proficient in? You have a look even though you may
not know anything about it. You are curious if there is anything special
that you can learn, if it is a language that you might like to play with.
And you don't do that by reading the full spec first :-)

Anyway the C standards commitee was mainly comprised of people with a
vested interest in pushing the standard towards what their particular
compiler already did. They were not so much interested in defining a
language and rewriting their compilers, more a case of this is what we
have so lets define a language around it so that your compiler behaves
like mine.

> All of this is
> specifically so that the amount of required run time checking can be
> minimized.

Hmm, I disagree. If you really want to reduce run time checking you really
need much stricter compile time checking. Having used C for a very long
time now I get the impression it was more a case of "let's get a compiler
together that can generate an executable". I remember when compiling a
trivial 50 line program on a PDP11 took about 5 minutes.

>
> C does not have a strong typing system. It'll happily let anyone mix and
> match incompatible types without a peep. It'll simply do an implicit cast
> as best it can. Experts like it that way. It drives newbies nuts.

As an expert who has had to debug other peoples systems, I can tell you
they don't like it that way.

>
> It's a choice of the standards committee. A newbie that isn't happy with =
it
> can certainly make other choices as to what language to use.

You'd think. The problem is that often they don't get a choice. It's kind
of like asking an Italian whould he rather learn English or Welsh :-)

>
>>
>> Look this is how you compare a signed 16 bit integer to an unsigned 16 b=
it
>> integer:
>>
>> int compare(int arg1, unsigned int arg2)
>> {
>>      if ((arg2 & 0x8000) !=3D 0)
>>      {   // arg2 > arg1
>>          return 1;
>>      }
>>
>>      if ((arg1 & 0x8000) !=3D 0)
>>      {   // arg1 < arg2
>>          return -1;
>>      }
>>
>>      // arg1 and arg2 are both in the same 15 bit range
>>      // return 0 if arg1 =3D=3D arg2
>>      // return < 0 if arg1 < arg2
>>      // return > 0 if arg1 > arg2
>>      return arg2 - arg1;
>> }
>
> Absolutely. And if this were encoded into the language, then the
> performance would really bite because this would have to be encoded for
> each and every incompatible type matchup.

Yes for each and every - but only for the incompatible types. Come on we
have this going on right now when it comes to comparing 16 bit and 32 bit
values, not to mention any integer and float combination.

Anyway, I've painted you a picture and you are only looking at the frame.

>
> Instead the standard simply states "I don't know what's important to you
> because you did not give me any typing hints. So I'm arbirarily choosing =
an
> implicit cast and rolling with it. If you don't like it, then expliticly
> cast so I know what code to generate."

That is exactly why it is wrong - because the compiler ***DOES*** know
what is important to you and it ***DOES*** have enough information to give
you the right answer.

>
> By doing that all of the lovely code you wrote above disappears and you a=
re
> left with a simple and efficient comparison.

Really? First and foremost, I encapsulated the code in a function to make
it easier to see the overhead which was just 2 checks of the top most bits
of the signed and unsigned variables. The rest of the code would have been
generated anyway. I did not want to present some obscure assembly code as
it would have been produced by the compiler. It would have been optimised
down to only a few instructions anyway. Trust me the aditional overhead
would have been tiny on a PIC16. Or don't trust me, go grab the compiler
and see for yourself :-)

you might like to think of it like this instead:

        if ((arg2 & 0x8000) !=3D 0)
        {   // arg2 > arg1
                goto lab2;
        }

        if ((arg1 & 0x8000) !=3D 0)
        {   // arg1 < arg2
                goto lab1;
        }

        if (arg1 < arg2)
        {
lab1:
                do_something;
        }
lab2:

or maybe like this:

        if (!((arg2 & 0x8000) !=3D 0)  &&
            (((arg1 & 0x8000) !=3D 0)  ||
             (arg1 < arg2))
        {
                do_something;
        }

Secondly, this is actually more efficent than using the "standard" C
alternative which would have been to promote both numbers to "long" before
comparing.

My above example would have become:

int compare(int arg1, unsigned int arg2)
{
     long x;

     // return 0 if arg1 =3D=3D arg2
     // return < 0 if arg1 < arg2
     // return > 0 if arg1 > arg2
     x =3D (long)arg1 - (long)arg2;

     if (x < 0)
     {    // arg1 < arg2
         return 1;
     }
     else
     if (x > 0)
     {    // arg1 > arg2
         return -1;
     }

     // arg1 =3D=3D arg2
     return 0;
}

Again this is encapsulated.

In a real world situation you might see something like this instead:

        if ((long)arg1 < (long)arg2)
        {
                do_something;
        }

I mean, really, look at the code that needs to be generated to do it the
"standard" C way. You need to sign extend two 16 bit variables to two 32
bit variables, then you need to do a multiprecission subtraction on those
32 bit variables and then you still need to check the result (possibly a
four byte "binary inclusive or" to see if the two values were actually
equal).

No, the "standard" C way, generates much less efficient code than the
"right" way.

>
> C was never a language designed to hold the programmer hand. It's a tool
> that does very well in the hands of someone who knows how to use it.
>
> It's a choice. I see it's a choice that you do not like. But there are
> valid reasons for it.

Like has nothing to do with it. I am an expert. C does not stop me doing
what I want to do. I am comfortable with it. It does not bother me that I
need to write lots of very basic things in C that another language might
have built in as standard (e.g. string handling of PHP). I am also
comfortable writing assembler code and again do not mind writing tons of
it.

As an expert with no axe to grind I now find myself in a position where I
must either shut up and pretend that C is doing this signed / unsigned
garbage for a good reason ***OR*** actually stand up and point out that
this actually is a flaw. There are ***NO*** valid reasons for it.

>
>>
>> The only "real" extra code is in the checking of the top bits of the
>> signed and unsigned integers. Generating this trivial overhead "inline" =
in
>> a compiler is, well, trivial! It doesn't stop you using the same type in=
 a
>> comparison but it does stop stupid bugs getting through. This is one
>> reason why I decided to write a non C compiler, to have the freedom to f=
ix
>> some of the stupid things that C does.
>
> And that's your choice. And while this comparison seems trivial to you no=
w,
> 50 years ago when the roots of C were growing, compact and efficient code
> were paramount. The time and memory to add that comparison inline would
> have made the language a lot less usable.

No, this argument really does not hold water. If the people that developed
the language saw fit to allow automatic casting between int and float,
with all the overheads involved, then there is absolutly no reason why a
little more effort could not have been made to compare signed and unsigned
integers correctly. There would have been no difference in efficiency in
the normal signed / signed and unsigned / unsigned comparison.

>
> Finally you miss the politics of standards. This implicit casting was
> encoded from the beginning. It was arbitrary. But once standardized, then
> millions of lines of code that depend on the behavior has subsequently be=
en
> generated. This takes the behavior and etches it in stone. Even if it wer=
e
> the wrong choice, it's too late to go back and fix now.

No, it is not. I remember when the default behaviour of the address
operator "&" was changed. It used to result in a "pointer to char", then
it was changed to "pointer to void". Oh my god! The world was going to
end! All the millions of lines of code that would no longer work!

What about the indescriminate use of "int" to hold pointers. And then
suddenly there was a big push to seperate "int" from "pointers".

Now we have specially defined types that are guarenteed to be a specific
size. Why does C need to remain static, why can't it evolve, why can't we
have a safe signed int and safe unsigned int that can be compared
correctly?

>
> So all that can be done is to educate newcomers as to why the odd behavio=
r
> arises and how to correct it. Simply put the right answer according to th=
e
> C standard (which BTW the OP figured out):
>
> if (a < (int) b) ...
>

No, that's not all that can be done. We, as users of the tools, can
actually point out the problems that we've found with them. Maybe in the
future the tools will get fixed. Maybe new tools will be invented and they
will take into account the problems we have found. But one thing is
certain, nothing will get better if we just ignore the problems.

> Of course this will break if b as an unsigned is larger than range of an
> int. So it isn't a perfect fix. Like everything else, it's a choice.
>
> BAJ

Regards
Sergio Masci
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
.