On Sun, 4 May 2014, Byron Jeff wrote: > On Sun, May 04, 2014 at 09:12:47PM +0100, smplx wrote: >> >> >> On Sat, 3 May 2014, Byron Jeff wrote: >> >>> On Sat, May 03, 2014 at 04:46:23PM +0100, smplx wrote: >>>> >>>> >>>> On Sat, 3 May 2014, Byron Jeff wrote: >>>> >>>>> On Fri, May 02, 2014 at 11:06:10PM -0600, PICLIST wrote: >>>>>> Further to my previous comment: >>>>>> >>>>>> It all makes sense when you think about it carefully. >>>> >>>> No it doesn't. We've just learned to accept that that is the way it is= :-) >>>> >>>> Newbies that fall into the trap are the ones that are doing things >>>> correctly. They learned how to do simple arithmetic with numbers befor= e >>>> they learned to program and now because C has had a standard retrofitt= ed >>>> we try to justify why they (the newbies) are wrong. >>> >>> I think that's a bit harsh. Compiler writers have to deal with the >>> limitations of the machines the code will run on. >> >> > > My favorite kind of debate... > >> >> If you look at it like that I can see how you would feel it is harsh. Bu= t >> as a programmer I get upset when other programmers take short cuts and >> produce "bad" code. As a compiler writer I have the right to get upset >> when I see other compiler writers taking short cuts because it is easier >> for them. > > I'm pretty sure if you examine the architecture and speeds of the machine= s > these languages were originally developed for, you will fine that these > supposed "short cuts" were necessary. I really don't think this was the case. If you look at the PDP-11 CPU architecture you will find that it has a very rich instruction set and set of addressing modes for these instructions. If the speed of execution were the paramount objective then they would have been better off not using dynamic stack based parameter passing and local variables. Something more along the lines of a static run time stack would have been much more appropriate. The PDP-11 has "bit test" instructions, it also has an overflow status flag which the "compare" instructions uses to perform signed and unsigned comparisons. The only logical conclusion I can come up with is ***NOT*** that the "short cuts" were necessary because of hardware restrictions but they were necessary to get the compiler operational. > > Let's not forget that original C developers were not cross compiling... > If they had been cross compiling it would have been more of an excuse because at lease then they could have claimed they writing code to be executed on a primative embedded system (8080 maybe :-)) >> >> If you really justify limited compilers by the limited hardware then >> surely you should not expect an 8 bit PIC16 to support 16 or 32 bit >> integers, let alone floating point math. Heck you shouldn't even be >> using C as there is no hardware stack to pass arguments on. > > Actually these are all perfectly valid arguments, especially for the > baseline 16F PIC parts. > > Try implementing a C compiler completely onboard on of these > microcontrollers. I think that you'll find that some choices were made fo= r > good reason. But the original C compiler wasn't developed to be run from the ROM space of an embedded MCU. It was developed on a mini which had fast backing store and was capable of swaping memory between backing store and RAM. >> here >> we are actually dealing with two well defined small sets the properties = of >> which allow us to tell if a number in one set is smaller, larger or equa= l >> to a number in the other set. Infinities don't come into it, > > Sure they do because not all of the numbers are representable in both set= s. > So by definition you have to do something special in order to match the > representations of those numbers in those two sets. But you have a clear indication in both numbers as to whether or not the ranges overlap and if not, which is the greater. It doesn't mater that you can't represent every number in both types of variables. What maters is what the programmer asked for i.e. is arg1 < arg2 > > If unbounded then unsigned integers are a strict subset of signed integer= s. > If unbounded then nothing special would have to be done to do the > comparison. > > I'm not arguing that you cannot do something out of the ordinary to make > the comparison work. What I'm arguing is that precisely because of the > physical limitations of the machine, that you do have to do something out > of the ordinary to make the comparison work. But promoting (casting of incompatible types to compatible types) ***IS*** out of the ordinary and they made that work. In this case the compiler is keeping track of what the types of the variables in an expression are and promoting them before use if necessary (sometimes needing to load the variables into special registers before being able to use them)... ***THIS IS OUT OF THE ORDINARY*** Here the compiler can't simply convert a C statement into a coressponding assembler instruction. If that were the objective (one for one conversion) then I would agree but the fact is there are many cases where a single C statement needs to be converted into several (sometimes many) machine code instructions, even on a powerful CPU like the PDP-11. > > When coupled with the limitation of the development environment at the ti= me > the cost of implementing the "correct" feature was worth more than the > benefit of implementing it. > I don't agree but let's say I do. In that case the compiler should have thrown an error. > Let me give you an example from the Standalone PIC24F FORTH > interpreter/compiler that I'm working on right now. For debugging purpose= s > it's useful to have the entire name of each implemented word in the > dictionary attached to the word. For word search of the dictionary, I use= a > 32 bit hash value of the name (Buzhash is the hashing algorithm). > > So I added the feature. But it doesn't fit into a 16K part. Fortunately > there are 32K parts that I can use where it does fit. However, it's a > feature choice that I have to decide in forgoing the smaller memory parts= .. > > Now if I give this to a newbie who cannot list available words and/or lis= t > the words in a definition because the names are not available, then am I > actually taking a shortcut when the environment precludes the addition of > such a feature? > >> representations don't come into it (as they don't if you are comparing a= n >> integer to a float). The only thing that really matters here is the fact >> that we can compare the value and tell how they relate to each other. We >> are not asking for the sum or difference of these two numbers and trying >> to cram the result into an insufficiently small space. We just want to >> know the order and we can do that very easily with the information >> available. > > But again it's at a special cost which it's likely that Kernighan and > Ritchie could not afford when they were developing the language. And even > if it could be afforded by the time the standard was established, there > simply was too large a codebase in existence to justify changing the > behavior. > > Decisions like these are rarely generated in a vacuum. There's a > constellation of system and political constraits that define them. > > BTW this is one of the reasons that there are so many programming > languages. A language creates a community. The standards arise from the > practices of that community. There are often those who disagree with thos= e > practices. However, usually they are so far ingrained in that community, > that the chances of change are NIL. So from that fracture both a new lang= uage, > and often a new community arises. >> >>> >>> Newbies will run into the same problem if they attempt to put a 46 bit >>> value into a 32 bit int. The result makes no sense mathematically. Howe= ver >>> from the perspective of the limited machine the operation is being done= on, >>> it does make sense. >> >> No we are not comparing like for like now. The point is that all the >> information required to do a signed / unsigned comparison ***IS THERE***= .. >> Unlike your 46 bit / 32 bit analogy where the information is being lost. >> > > No need to show the proof. It's clear we're discussing cross purposes. Yo= u > are arguing that because it could be done, it should have been done. I'm > arguing that the development environment at the time precluded having a > special case, and so an arbitrary default case was choosen. That practice > was then codified into the standard. > >> simple proof: >> >> // where int and unsigned int are both 16 bit values >> // and longs are signed 32 bit values >> int a; >> unsigned int b; >> >> if ((long)a < (long)b) { } >> >> works as a mathmatician would expect. > > BTW this doesn't work if long and int are the same size. And BTW the > standard allows for that. > > And also if long is larger than the int sizes, it works precisely because > you sign extend the number into a larger space. So the sign bit no longer > shares a bit with a data bit of the unsigned. > > So I'm not exactly sure what the above example proves... The fact that the compiler knows how to sign extend the signed int and the unsigned int at the point of the comparison shows that the compiler has the information necessary to do the comparison correctly ***WITHOUT*** needing to actually do the sign extensions and perform the redundent multiprecision arithmetic. And yes I understand what you mean about int and long being the same size which is why I explicity stated the size of ints and longs. Anyway even if ints and longs were the same size you could just substitute (long long) or even "double" in place of long in the above example. The point hasn't changed, the compiler still has everything necessary to give you the right answer. > >> >>> >>> And my real pushback is the fact that there is this prevailing notion t= hat >>> it's always best to make things simple for newbies at all cost. This is >>> often a mistake because newbies who are regular users of a tool will no= t >>> remain newbies to that tool. And precisely the same gymnasitics that wo= uld >>> be necessary to align the outcome from the compiler to match a newbies >>> expectations will become the exact same performance hindrance that will >>> annoy the intermediate or expert user. >> >> No, I didn't mean we should be making things easier for newbies. What I >> was trying to show is that their expectations are unbiased by the langua= ge >> they have started to use whereas we have grown so used to it that we jus= t >> accept it. > > But their expectation is incorrect because they are a newbie. signed int > and unsigned int are different and somewhat compatible types. Technically= a > signed int is a 15/31 bit data quantity with a sign bit while an unsigned > bit is a 16/32 data quantity. The rule to be learned is that if you wish = to > compare two somewhat compatible types, be explicit about what type > comparison you wish to do. > >> >> There would be no performance penalties in the hands of an experienced >> programmer who knows to use signed with signed and unsigned with unsigne= d. > > Because an experienced programmer will use an explicit cast. > >> But it would certainly remove the occational subtle bug that creeps in >> even in the hands of the most experienced programmer. > > An issue that is resolved by not mixing types. > >> >>>>> >>>>> It does. And the real takeaway is that when working with incompatible= types >>>>> you have to give the compiler hints as to the correct direction to go= .. >>>> >>>> This is what really really irritates me - they are not incompatible!!! >>> >>> Of course they are because of the physical limitations of the memory us= ed >>> to hold them. signed and unsigned integers of exactly the same size can= not >>> hold the same numbers. There are no negative numbers in an unsigned int= .. So >>> they are incompatible. >> >> No they are not incompatible. A 16 bit (2's complement) signed variable >> can hold a number in the range -32768 to +32767 whereas an unsigned 16 b= it >> variable can hold a number between 0 and 65536. It is trivial to calcula= te >> when one variable holds a value less than, equal to or greater than the >> other so why are they incompatible from a comparison point of view. > > Because when you have the bit pattern 1111111111111111 they represent two > different values based on the two types. There is another "bit" of information here which you are not taking into account. The signed variable has the pattern s1111111111111111 while the unsigned variable has the pattern u1111111111111111. The extra "bit" is virtual and its value ("s" or "u") is held in the compiler at compile time and the executable by virture of the sequence of machine code instructions the compiler generates. > Therefore you cannot do a simple > comparison of the two when that bit pattern is assigned two those two > different types. You need to add a few machine code instructions (but only) for the special case. And this is no more difficult than doing a multiprecision comparison. >Simply consider the fact that that the bit pattern above > isn't even equal when it is assigned to those two different types. > > So every comparison operation is a special case as opposed to if the two > types were the same. No, every comparison is ***NOT*** a special case, only those between signed and unsigned ints. And as I have shown, the compiler knows which these are. > >> >> You seem to be arguing that "once the value has been loaded into a 16 bi= t >> variable that it loses all meaning and just becomes a collection of bits= .. > > I'm saying that the Standards rules are designed to force the two items t= o > have the same type for comparison. This is done to simplfy the comparison= .. > >> So that comparing one collection of bits with another is not possible if >> the collections are different". But the meaning has not been lost, the >> compiler knows what is in each variable. Why is it that the compiler can >> keep track of comparing a "long" (a 32 bit collection of bits) and a >> "float" (another 32 bit collection of bits) and generate the correct cod= e >> to compare them? > > I'll bet that it isn't correct. By definition you'll have to lose precisi= on > on the long in order to convert it to a float. > Yes you lose precision but you do so in a controlled way. When you convert a precise number such as 2999995 to something with a little more meaning like "3 million" you round it (up, down, off, whatever) but it still makes sense in the context you are using it. When you convert a long to a float you lose the least significant 8 bits but you can easily round the ***NEXT*** 24 bits (up, down etc). The fact is you ***CAN*** still compare your 32 bit int to a 32 bit float and get a valid result. Anyway, if it really bothers you you could convert them both to double (you'll still get the same answer) :-) >> >>> >>> What probably had been best for a newbie is to force an explicit cast. = Then >>> it would be a bit more obvious that the programmer has to make a decisi= on >>> about which part of the range is important. But again all it would do i= s >>> annoy more advanced practitioners who either would be perfectly happy w= ith the >>> compiler's handling of the situation, or would throw in the explicit ca= st >>> in order to direct the correct action for the situation. >> >> But this isn't just about forcing newbies or advanced practitioners to d= o >> something. It's about the fact that the information is actually availabl= e >> to do something sensible with. > > And for reasons that I've already outlined, the decision was made not to = do > so. C was a language developed with an "efficiency at all cost" mentality= .. > Correct comparison of signed/unsigned types was not efficient. If the > programmer wanted to do that comparison, then fine. However, the > implementation wasn't going to go out of its way to generate a correct > special case at the cost of efficiency. > > I'll have to take the rest on a bit later... > -- > Byron A. Jeff > Chair: Department of Computer Science and Information Technology > College of Information and Mathematical Sciences > Clayton State University > http://faculty.clayton.edu/bjeff > -- Friendly Regards Sergio Masci -- http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .