On Thu, 9 Jul 2009, Gerhard Fiedler wrote:

> sergio masci wrote:
> 
> >> I was talking about standard libraries, where the programmer doesn't
> >> have to attach anything. The difference between libraries in general
> >> and standard libraries is that the standard libraries must conform
> >> to a standard to be a standard library, not any more or any less
> >> than a compiler. Of course such a standard in general doesn't define
> >> how it is implemented, but it defines what it does.
> > 
> > So you are saying a standard library (I thought you were talking
> > about the STL as you were also discussing containers at the time)
> > should be different to a user defined library? I think the users
> > would eat you alive if you dared create a divide between the standard
> > and user libraries (such that the user libraries could not optimised
> > to the same high level as the standard libraries). 
> 
> You were talking about intent. For a normal library, the compiler
> doesn't know about intent. For a library with a standardized interface
> (for example, the C++ STL) the compiler does know about intent.
> 
> >> This function should have been part of the interface in the first
> >> place. This is the tricky part of defining a good library interface:
> >> provide the interface that makes sense.
> > 
> > No you've missed my point here. It's not that the function was
> > missing it's that the programmer didn't use it and the compiler had
> > no way of knowing that there was a better way to do things - because
> > they were library functions with a brick wall in the way.
> 
> I'm not sure whether you're not missing my point. If all three functions
> are defined as functions of a standard library (including their
> semantics), the compiler "knows" about them and what they do. And if
> replace is a suitable (and more efficient) substitution for delete
> followed by an insert, it may substitute it.
> 
> 
> >> Of course. C for example lacks any multitasking specification; it
> >> wasn't relevant at the time the C spec was created, because it was
> >> handled by the operating system, exclusively.
> > 
> > I think you might want to check that. C was used in multitasking
> > situations on bare metal (without a cosey OS or even a supervisor)
> > for some time before the C spec was created. Look at all the apps
> > written for embedded or process control systems in C. Also OCCAM and
> > ADA supported multitasking directly.
> 
> I was talking about C, not about other languages. And besides sequence
> points and some additional guarantees for volatile variables, you won't
> find many help in the C standard for multitasking, especially when
> multiple cores are involved. 
> 
> It has been done, but one needs to know what one does when doing so, and
> it's not necessarily portable. For example, the standard doesn't
> prohibit RMW problems that occur because the compiler would load two
> 16bit variables in a 32bit register, work on one, but write both back
> (while potentially the other one has been overwritten in memory by
> another thread running on another core). There are a number of other
> such problems that are not addressed by the standard.
> 
> 
> > Having a language and compiler that would generate efficient
> > multitasking code on both a 628 and Windows XP in a multicore
> > processor would be a feat. But the hardest part (that of getting it
> > to work on a 628) is done! Surely you can see that doing the same for
> > a system with MUCH better resources is trivial in comparison. 
> 
> No, I can't see that. Portable, safe and efficient multitasking on a
> modern multicore system is in no way trivial. It is IMO more complex
> than multitasking on a 628, by a few magnitudes.
> 
> 
> > This discussion certainly is helping me think more deeply about things
> > I have taken for granted for a long time now. 
> > 
> > For one thing I wonder just how popular C would have been without the
> > conditional compilation capability provided by the C pre-processor.
> > Allowing one set of source files to be compiled for different
> > targets. There again the interface between the pre-processor and the
> > compiler proper is another brick wall.
> 
> I think they serve different purposes, but the C preprocessor has been
> used to address some of the C compiler's shortcomings. I don't think the
> separation (or "brick wall") between the two is really a problem. I
> think C would be a "better C" (especially for small micros) if it had a
> compile-time evaluation capability. And Pascal would have been a better
> Pascal if it came with a preprocessor. 
> 
> (The latter is a bit easier to address if you're not using an integrated
> IDE like Turbo Pascal, because then you can add a preprocessor step to
> your build step. But then again, everybody would use a different
> preprocessor and the code wouldn't be portable again :)
> 
> 
> >> If the semantics of assign and mult are clearly defined by a standard,
> >> it could do the same optimizations. 
> > 
> > Yes it could but again it would be down to the compiler to generate the 
> > appropriote optimised code and not just generate code to handle the 
> > calling protocol of library functions.
> 
> Right. But sometimes the compiler does generate the code for the library
> functions (or at least it's under control of the programmer whether it
> does or not), so I'm not seeing the difference. 
> 
> Of course, the traditional C concept of separate translation units is
> just the way you're describing. But this doesn't have to be so, and
> supposedly e.g. HiTech's OCG or Microsoft's link-time code generation
> are attempts to improve on this.
> 
> I think it may be (trying to be careful here :) that you're focusing too
> much on some arbitrary limitations of C. It's probably unavoidable, but
> many of C's limitations are arbitrary (that is, mostly historical) and
> not necessary.
> 
> > Can you not see that in adding more and more attributes to the library
> > all you are really doing is trying to push the libraries into the
> > compiler?
> 
> I never thought or wrote about adding attributes to the library. I just
> wasn't thinking in terms of separate translation units, I was thinking
> of /standard/ libraries (that is, libraries with a clearly defined
> interface) and I was thinking about libraries that come in source code.
> 
> 
> > Yes but we are talking about what comes naturaly not what comes of
> > years of discipline of using a particular language. A great many
> > pleople get caught with this type of integer division in C. It takes
> > years for many people to reliably use integer division (without the
> > odd slip). The compiler is there to help you. The computer is there
> > to do what you want.
> 
> I think when it comes to type conversion, it should always be explicit.
> This is especially important on small systems with limited resources. It
> is my experience that more often than not, if you don't do it
> explicitly, you have to comment it one way or another. I prefer not to
> comment it, but to do it explicitly in the code.
> 
> > In XCSB the compiler says "Ah, you want to assign the result to a
> > floating point variable therefor I wont just discard the remainder
> > I'll do the calc as a float". In XCSB if you wanted to do an integer
> > division even though the result should be a float you would do 
> > 
> > x = (int)(y / z)
> 
> IMO, as I just said, I don't like automatic conversions. I don't think
> the compiler has a good chance to figure out what I want. Pinning the
> operation on the result is just as dangerous as pinning it on the
> operands. I may divide one 32bit int by another 32bit int and put the
> result into an 8bit int. I don't want the division to be an 8bit int
> division. I may divide an 8bit int by another 8bit int and put the
> result into a 32bit int, and I don't want the division to be a 32bit int
> division. And so on... I think such type conversions should be explicit.
> 
> 
> > I'm not. The fact is the compiler does not know. It has an interface
> > definition for each funtion in the form of a function prototype in a
> > header file somewhere. But that's about it. 
> 
> I think this may be the one misunderstanding that we have. I'm thinking
> of libraries that are available in source code with a standardized and
> clearly defined interface, like most (if not all) C++ standard
> libraries. 
> 
> Also, even though the comparison between XSCB (did I get it right this
> time? :) and C is kind of underlying here, I don't think that C's
> arbitrary limitations are relevant to this discussion /in principle/.
> 
> I know that for everyday work, you don't compile the standard library;
> you use pre-compiled "brick walled" object code libraries. (I don't know
> how much e.g. Microsoft's link-time code generation does bring here;
> supposedly, since it's at link time, it would also optimize library code
> that's pre-compiled.) But nothing would prevent the programmer from
> running the compiler on the whole source code.
> 


Ok I've read and re-read your posts trying to understand what it is that
you are trying to say in order to better understand how I am failing to
get my point across.

You seem to be saying that provided a set of libraries are well written
and available in source form to the compiler that it can compile these
together with the user program and (given that the compiler is
implemented well enough by the compiler writers) that the compiler should
be able to extract enough information from all the combined source code
to generate a resulting executable that is as good as one that would be
generated if the language had more built-in features such as STRING,
LIST, DYNAMIC ARRAYS etc. Furthermore that the compiler should be able
to catch the same kind of bugs in both cases.

Ok given an infinately fast build system with an infinate amount of RAM
and a mega complex compiler that looks for every possible combination of
source code - breaking it down and rearrangeing everything in an attempt
to understand every possible consequence of every combination of
statements given - then yes I would agree. However this is just
completely impracticle.

As an example lets just consider how many different ways you can append
a number (as a string) to another string:

Here's a simple bit of code:

	char	str[100];
	char	buff[16];
	int	j, len;


	strcpy(str, "hello world");

	lne = strlen(str);

	sprintf(buff, "%d", j);

	strcpy(str+len-1, buff);


another way of doing this would be

	strcpy(str, "hello world");

	sprintf(buff, "%d", j);

	strcat(str, buff);


yet another way would be

	sprintf(str, "hello world%d", j);


and another way would be

	strcpy(str, "hello world");

	sprintf(buff, "%d", j);

	lne = strlen(str);

	strcpy(str+len-1, buff);


and another

	strcpy(str, "hello world");

	sprintf(buff, "%d", j);

	strcpy(str+strlen(str)-1, buff);


and another

	strcpy(str, "hello world");

	sprintf(buff, "%d", j);

	lne = strlen(str);

	str += len-1;

	strcpy(str, buff);


and another

	strcpy(str, "hello world");

	str += sprintf(buff, "%d", j);

	str--;

	strcpy(str, buff);


and another

	strcpy(str, "hello world");

	strcpy(str + sprintf(buff, "%d", j) - 1, buff);


So if I tell the compiler that all the above is simply a way of
appending the ASCII representation of a number to a string what should
it make of the following incorrect piece of code:

	strcpy(str, "hello world");

	sprintf(buff, "%d", j);

	strcpy(str+len-1, buff);


How is the compiler supposed to know that I am trying to append the
ASCII representation of a number to a string and I got it wrong? In this
case is the compiler just supposed to take the code I've written as
correct (because it doesn't recognise what I actually wanted to do) and
so generate an executable with a bug?

if strings were built into the language we might instead write:

	str = "hello world" + string(j)

Wow, how cool is that! So easy to write, so easy to understand when you
come back to it years later. Even a dumb compiler would have no trouble
understanding this statement, and a clever compiler would be able to put
some nice optimisations in place. And we even got rid of the huge printf
library as a side effect.

I'm pretty sure that at this point you will (as others have done) start
telling me that I should be using other functions to "encapsulate" this
fragment of "knowledge". Ok so you encapsulate it, you produce yet
another standard library function but you still have the same problem -
the compiler still needs to be able to understand this fragment if it is
to have the same capabilities as a compiler that has this as a built-in.
The only thing you achive by placing this fragment in a function is to
reduce the possibility that someone will rewrite the code incorrectly.
Note here that I say reduce and not eliminate. A user would still rather
write a couple of lines of code than waste time looking up a trivial
function in an over inflated library. You need to give the user a real
incentive to use this new library function. On the other hand if this
fragment of functionality is built into the language (and compiler) and
is itself a sub-set of a basic component (feature) of the language (e.g.
string handling) then the user will embrace it sooner or later because
he/she will be constantly exposed to this feature and gradually progress
from using basic parts of this feature to the more complex aspects of
it.

Another problem with using functions as opposed to built-ins (call them
features if you like) is where you need to distinguish between the
overloaded functions and the parameters of both are of the same type.
Then you are stuck and need to resort to using different names for the
functions whereas it would be more natural to use one consistant name
(hance the function overloading in the first place).

So say for example I wanted to extract a sub-string from a string. I could
have a function:

	// create a new string from 'pos1' to the end of the string
	str2 = substr(str1, pos1);

	// create a new string from 'pos2' to the start of the string
	pos2 = -pos;
	str2 = substr(str1, pos2);

	// create a new string from 'pos1' to 'pos2'
	str2 = substr(str1, pos1, pos2);


but what if I wanted to extract a sub-string that was a given length
rather than bewteen two positions. I couldn't write

	str2 = substr(str1, pos1, len);


because this would be seen by the compiler as

	substr(char *, int, int);


which also corresponds with the above use for

	str2 = substr(str1, pos1, pos2);


I could have the equivalent functions

	// create a new string from 'pos1' to the end of the string
	str2 = substr_to_end(str1, pos1);

	// create a new string from 'pos1' to the start of the string
	str2 = substr_from_start(str1, pos1);

	// create a new string from 'pos1' to 'pos2'
	str2 = substr(str1, pos1, pos2);

	// create a new string from 'pos1' of length 'len
	str2 = substr_lengeth(str1, pos1, len);



But this comes with it's won hazards, mainly that the user could VERY
easily write:

	str2 = substr(str1, pos1, len);

where he actually needed:

	str2 = substr_lengeth(str1, pos1, len);


Realistically how is a conventional compiler (one that is not mega
complex and running on an infinately fast build machine with an infinate
amount of RAM) going to spot this type of mistake without adding a ton
of attributes to the function prototype?

If strings were built in we could simply say something like

	str2 = substr str1 from pos1 to pos2

or

	str2 = substr str1 from pos1 to end

or

	str2 = substr str1 from pos1 length len




I keep talking about the compiler understanding the intent of the
programmer and you keep saying that a compiler could do this if it had
all the source available. I wonder if what you are really saying is that
the compiler can do more error checking and optimisation because it has
all the source rather than pre-compiled libraries? Because this is
definately not what I'm getting at by "intent". What I mean is (as
above) where the compiler is able to recognise a fragment of code as
meaning "do something special" (such as append one string to another).

This (recognising intent) is easy to do if the language has an
understanding of common basic types such as strings, lists etc but
increadibly hard to do if it does not.

I talk about the brick wall between the compiler and the libraries and
you respond with "make the source of the libraries available". Making
the source available still means that the compiler needs to do a hell of
a lot of work to to try to understand the intend behind each and every
function. How for example would the compiler recognise a function whose
purpose it is to seach a list for a particular string and if it does not
exist then insert a copy of the string into the list in alphabetic
order?

Look at the way programs are commented now so that other programmers
coming along later can understand what a fragment of code is actually
trying to achive. There again we have a brick wall between the comments
and the compiler. The comments don't actually help the compiler verify
or optimise the code.

What we need are features that make it easier for the programmer to
understand the code. Features that cut down on the low level mundane
error prone repetetive code that the programmer needs to write. Features
that allow some of the code and comments to merge - making it hard for
incorrect comments to be left in place and making it easy to see what
the source code actually means. Does this all sound familiar? Isn't this
one of the arguments made when trying to persuade users to move from
assembler to high level languages.

Look at a small fragment for inserting an item into a list:

	item = &root;

	while (*item != NULL)
	{
		if (strcmp((*item)->key, key) > 0)
		{
			temp = *item;
			*item = new_item(key);
			item = &(*item)->next_item;
			*item = temp;

			break;
		}

		item = &(*item)->next_item;
	}

re-written another way:

	item = &root;

	while (*item != NULL)
	{
		if (strcmp((*item)->key, key) > 0)
		{
			break;
		}

		item = &(*item)->next_item;
	}

	if (*item != NULL  &&
	    strcmp((*item)->key, key) != 0)
	{
		temp = *item;
		*item = new_item(key);
		item = &(*item)->next_item;
		*item = temp;
	}


Now try writing a rule that understands the above two small
fragments as two identical units.

Friendly Regards
Sergio Masci
-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist