On Sun, May 04, 2014 at 09:12:47PM +0100, smplx wrote:
>
>
> On Sat, 3 May 2014, Byron Jeff wrote:
>
> > On Sat, May 03, 2014 at 04:46:23PM +0100, smplx wrote:
> >>
> >>
> >> On Sat, 3 May 2014, Byron Jeff wrote:
> >>
> >>> On Fri, May 02, 2014 at 11:06:10PM -0600, PICLIST wrote:
> >>>> Further to my previous comment:
> >>>>
> >>>> It all makes sense when you think about it carefully.
> >>
> >> No it doesn't. We've just learned to accept that that is the way it is=
 :-)
> >>
> >> Newbies that fall into the trap are the ones that are doing things
> >> correctly. They learned how to do simple arithmetic with numbers befor=
e
> >> they learned to program and now because C has had a standard retrofitt=
ed
> >> we try to justify why they (the newbies) are wrong.
> >
> > I think that's a bit harsh. Compiler writers have to deal with the
> > limitations of the machines the code will run on.
>
> <friendly debate mode>

My favorite kind of debate...

>
> If you look at it like that I can see how you would feel it is harsh. But
> as a programmer I get upset when other programmers take short cuts and
> produce "bad" code. As a compiler writer I have the right to get upset
> when I see other compiler writers taking short cuts because it is easier
> for them.

I'm pretty sure if you examine the architecture and speeds of the machines
these languages were originally developed for, you will fine that these
supposed "short cuts" were necessary.

Let's not forget that original C developers were not cross compiling...

>
> If you really justify limited compilers by the limited hardware then
> surely you should not expect an 8 bit PIC16 to support 16 or 32 bit
> integers, let alone floating point math. Heck you shouldn't even be
> using C as there is no hardware stack to pass arguments on.

Actually these are all perfectly valid arguments, especially for the
baseline 16F PIC parts.

Try implementing a C compiler completely onboard on of these
microcontrollers. I think that you'll find that some choices were made for
good reason.

>
> >
> > In pure mathematics, integers are infinite unbounded sets. They cannot =
be
> > so on a piece of hardware. Therefore there are some limitations.
>
> In pure mathematics infinity is a process (actually one of many) but here
> we are actually dealing with two well defined small sets the properties o=
f
> which allow us to tell if a number in one set is smaller, larger or equal
> to a number in the other set. Infinities don't come into it,

Sure they do because not all of the numbers are representable in both sets.
So by definition you have to do something special in order to match the
representations of those numbers in those two sets.

If unbounded then unsigned integers are a strict subset of signed integers.
If unbounded then nothing special would have to be done to do the
comparison.

I'm not arguing that you cannot do something out of the ordinary to make
the comparison work. What I'm arguing is that precisely because of the
physical limitations of the machine, that you do have to do something out
of the ordinary to make the comparison work.

When coupled with the limitation of the development environment at the time
the cost of implementing the "correct" feature was worth more than the
benefit of implementing it.

Let me give you an example from the Standalone PIC24F FORTH
interpreter/compiler that I'm working on right now. For debugging purposes
it's useful to have the entire name of each implemented word in the
dictionary attached to the word. For word search of the dictionary, I use a
32 bit hash value of the name (Buzhash is the hashing algorithm).

So I added the feature. But it doesn't fit into a 16K part. Fortunately
there are 32K parts that I can use where it does fit. However, it's a
feature choice that I have to decide in forgoing the smaller memory parts.

Now if I give this to a newbie who cannot list available words and/or list
the words in a definition because the names are not available, then am I
actually taking a shortcut when the environment precludes the addition of
such a feature?

> representations don't come into it (as they don't if you are comparing an
> integer to a float). The only thing that really matters here is the fact
> that we can compare the value and tell how they relate to each other. We
> are not asking for the sum or difference of these two numbers and trying
> to cram the result into an insufficiently small space. We just want to
> know the order and we can do that very easily with the information
> available.

But again it's at a special cost which it's likely that Kernighan and
Ritchie could not afford when they were developing the language. And even
if it could be afforded by the time the standard was established, there
simply was too large a codebase in existence to justify changing the
behavior.

Decisions like these are rarely generated in a vacuum. There's a
constellation of system and political constraits that define them.

BTW this is one of the reasons that there are so many programming
languages. A language creates a community. The standards arise from the
practices of that community. There are often those who disagree with those
practices. However, usually they are so far ingrained in that community,
that the chances of change are NIL. So from that fracture both a new langua=
ge,
and often a new community arises.
>
> >
> > Newbies will run into the same problem if they attempt to put a 46 bit
> > value into a 32 bit int. The result makes no sense mathematically. Howe=
ver
> > from the perspective of the limited machine the operation is being done=
 on,
> > it does make sense.
>
> No we are not comparing like for like now. The point is that all the
> information required to do a signed / unsigned comparison ***IS THERE***.
> Unlike your 46 bit / 32 bit analogy where the information is being lost.
>

No need to show the proof. It's clear we're discussing cross purposes. You
are arguing that because it could be done, it should have been done. I'm
arguing that the development environment at the time precluded having a
special case, and so an arbitrary default case was choosen. That practice
was then codified into the standard.

> simple proof:
>
>       // where int and unsigned int are both 16 bit values
>       // and longs are signed 32 bit values
>       int a;
>       unsigned int b;
>
>       if ((long)a < (long)b) { }
>
> works as a mathmatician would expect.

BTW this doesn't work if long and int are the same size. And BTW the
standard allows for that.

And also if long is larger than the int sizes, it works precisely because
you sign extend the number into a larger space. So the sign bit no longer
shares a bit with a data bit of the unsigned.

So I'm not exactly sure what the above example proves...

>
> >
> > And my real pushback is the fact that there is this prevailing notion t=
hat
> > it's always best to make things simple for newbies at all cost. This is
> > often a mistake because newbies who are regular users of a tool will no=
t
> > remain newbies to that tool. And precisely the same gymnasitics that wo=
uld
> > be necessary to align the outcome from the compiler to match a newbies
> > expectations will become the exact same performance hindrance that will
> > annoy the intermediate or expert user.
>
> No, I didn't mean we should be making things easier for newbies. What I
> was trying to show is that their expectations are unbiased by the languag=
e
> they have started to use whereas we have grown so used to it that we just
> accept it.

But their expectation is incorrect because they are a newbie. signed int
and unsigned int are different and somewhat compatible types. Technically a
signed int is a 15/31 bit data quantity with a sign bit while an unsigned
bit is a 16/32 data quantity. The rule to be learned is that if you wish to
compare two somewhat compatible types, be explicit about what type
comparison you wish to do.

>
> There would be no performance penalties in the hands of an experienced
> programmer who knows to use signed with signed and unsigned with unsigned=
..

Because an experienced programmer will use an explicit cast.

> But it would certainly remove the occational subtle bug that creeps in
> even in the hands of the most experienced programmer.

An issue that is resolved by not mixing types.

>
> >>>
> >>> It does. And the real takeaway is that when working with incompatible=
 types
> >>> you have to give the compiler hints as to the correct direction to go=
..
> >>
> >> This is what really really irritates me - they are not incompatible!!!
> >
> > Of course they are because of the physical limitations of the memory us=
ed
> > to hold them. signed and unsigned integers of exactly the same size can=
not
> > hold the same numbers. There are no negative numbers in an unsigned int=
.. So
> > they are incompatible.
>
> No they are not incompatible. A 16 bit (2's complement) signed variable
> can hold a number in the range -32768 to +32767 whereas an unsigned 16 bi=
t
> variable can hold a number between 0 and 65536. It is trivial to calculat=
e
> when one variable holds a value less than, equal to or greater than the
> other so why are they incompatible from a comparison point of view.

Because when you have the bit pattern 1111111111111111 they represent two
different values based on the two types. Therefore you cannot do a simple
comparison of the two when that bit pattern is assigned two those two
different types. Simply consider the fact that that the bit pattern above
isn't even equal when it is assigned to those two different types.

So every comparison operation is a special case as opposed to if the two
types were the same.

>
> You seem to be arguing that "once the value has been loaded into a 16 bit
> variable that it loses all meaning and just becomes a collection of bits.

I'm saying that the Standards rules are designed to force the two items to
have the same type for comparison. This is done to simplfy the comparison.

> So that comparing one collection of bits with another is not possible if
> the collections are different". But the meaning has not been lost, the
> compiler knows what is in each variable. Why is it that the compiler can
> keep track of comparing a "long" (a 32 bit collection of bits) and a
> "float" (another 32 bit collection of bits) and generate the correct code
> to compare them?

I'll bet that it isn't correct. By definition you'll have to lose precision
on the long in order to convert it to a float.

>
> >
> > What probably had been best for a newbie is to force an explicit cast. =
Then
> > it would be a bit more obvious that the programmer has to make a decisi=
on
> > about which part of the range is important. But again all it would do i=
s
> > annoy more advanced practitioners who either would be perfectly happy w=
ith the
> > compiler's handling of the situation, or would throw in the explicit ca=
st
> > in order to direct the correct action for the situation.
>
> But this isn't just about forcing newbies or advanced practitioners to do
> something. It's about the fact that the information is actually available
> to do something sensible with.

And for reasons that I've already outlined, the decision was made not to do
so. C was a language developed with an "efficiency at all cost" mentality.
Correct comparison of signed/unsigned types was not efficient. If the
programmer wanted to do that comparison, then fine. However, the
implementation wasn't going to go out of its way to generate a correct
special case at the cost of efficiency.

I'll have to take the rest on a bit later...
--
Byron A. Jeff
Chair: Department of Computer Science and Information Technology
College of Information and Mathematical Sciences
Clayton State University
http://faculty.clayton.edu/bjeff
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
.