On Wed, 15 Jul 2009, Gerhard Fiedler wrote:

> sergio masci wrote:
> 
> > It is now clear to me that you are talking about intrinsic functions.
> > Yes the function is defined in a /standard/ library but the compiler
> > also knows about the function independently of the library. 
> 
> Would you call the C++ standard library functions (like
> std::list::insert) "intrinsic functions"? At least the meaning that this
> term seems to have in the Microsoft VC++ and gcc compiler documentation
> is not what I'm talking about.

No.

> 
> 
> >>> if strings were built into the language we might instead write:
> >>> 
> >>> 	str = "hello world" + string(j)
> >> 
> >> See, in C++ for example, strings are /not/ built into the language,
> >> and you can write pretty much exactly this. (Not with the
> >> std::string, but if you extend it a bit, you can, so in the case of
> >> C++ it's not really a question whether or not it can be done with
> >> strings in a library but whether the library definition is
> >> sufficient.) 
> > 
> > But the C++ compiler understands what is going on here even less. We
> > now end up adding even more run time overheads just to make the
> > source code look better.
> 
> Not sure to what degree a compiler "understands", and I don't want to
> drift off in a discussion about the arbitrary shortcomings of C or C++.
> But when a compiler "knows" the intent of the line (because all
> operations that happen are defined in the language standard) and knows
> the implementation (because it of course "sees" the implementation of
> the operators that are implemented in the compiler, but it also sees the
> implementation of the library functions by making their sources
> available) -- what's the difference that's left between built-in and
> library?

home in on "'knows' the intent of the line"

I am not talking about parsing a line and understanding it its meaning, 
I'm talking about understaning several lines as a unit. What is the 
programmer trying to say in these several lines.

If I write:

	for (j=0; j<strlen(str); j++)
	{
		if (str[j] >= 'a'  &&  str[j] <= 'z')
		{
			str[j] = str[j] ^ ('a' ^ 'A');
		}
	}

would you expect the compiler to flag "j<strlen(str)" or maybe replace it 
with the much more optimised:

	xlen = strlen(str);

	for (j=0; j<xlen; j++)
	{
		if (str[j] >= 'a'  &&  str[j] <= 'z')
		{
			str[j] = str[j] ^ ('a' ^ 'A');
		}
	}

maybe optimise it further as:

	xlen = strlen(str);

	for (j=0; j<xlen; j++)
	{
		ptr = &str[j];

		if (*ptr >= 'a'  &&  *ptr <= 'z')
		{
			*ptr = *ptr ^ ('a' ^ 'A');
		}
	}

So here the compiler looked at several statements as a single unit and was 
able to optimise it (this is quite within the capabilities or modern C 
compilers).

Now going a little further you could say that it was the intent of the 
programmer to "change all lower case alpha characters to uppercase". So 
the intent of the programmer spans several statements.

Put this another way, if I write in assembler:

	movf	X+0, w
	addwf	Y+0
	btfsc	STATUS, C
	incf	Y+1
	movf	X+1, w
	addwf	Y+1

This code adds one 16 bit variable to another 16 bit variable. Here the 
programmers intent is to add 16 bit variable X to 16 bit variable Y.

It is far easier for you, me and the compiler to understand the 
programmers intent if instead I write:

	X = X + Y;

> 
> 
> >> I agree that the lack of a built-in decent string type in C can be a
> >> pain, especially in terms of syntax. OTOH, I bet your strings are
> >> 8-bit strings. Now what if I need to handle Unicode strings? Wait
> >> for a compiler upgrade? And what if that compiler upgrade doesn't
> >> handle the Unicode encoding I need? 
> > 
> > Yes I understand your point of view, but 8-bit strings are still very
> > useful even if you need to use Unicode in the same program. Just like
> > integers are very useful even though you might need to use floating
> > point.
> 
> Right. My point was that if 8-bit strings are built-in and Unicode
> strings are in a library, and you are claiming (elsewhere) that the
> built-in syntax can be different from library syntax, then I need to
> make the Unicode string syntax completely different -- structurally
> different -- from 8-bit string syntax. Can you imagine that? What a
> pain.
> 

Yes you are right it would be a pain. But it is a move in the right 
direction.

> 
> >> (Snipped prelude to substr argument.)
> >> 
> >>> But this comes with it's won hazards, mainly that the user could VERY
> >>> easily write:
> >>> 
> >>> 	str2 = substr(str1, pos1, len);
> >>> 
> >>> where he actually needed:
> >>> 
> >>> 	str2 = substr_lengeth(str1, pos1, len);
> >>> 
> >>> Realistically how is a conventional compiler (one that is not mega
> >>> complex and running on an infinately fast build machine with an
> >>> infinate amount of RAM) going to spot this type of mistake without
> >>> adding a ton of attributes to the function prototype?
> >>> 
> >>> If strings were built in we could simply say something like
> >>> 
> >>> 	str2 = substr str1 from pos1 to pos2
> >>> 
> >>> or
> >>> 
> >>> 	str2 = substr str1 from pos1 to end
> >>> 
> >>> or
> >>> 
> >>> 	str2 = substr str1 from pos1 length len
> >> 
> >> This is a simple matter of syntax. You don't do much more here than
> >> comparing C-style syntax with BASIC-style syntax. A matter of
> >> taste... The BASIC-style syntax uses "substr ... from ... to" and
> >> "substr ... from ... length". A similar C-style syntax could use
> >> substr_from_to and substr_from_len -- or any number of similar
> >> variants. Then there's the LISP syntax, and a few others. I don't
> >> see what this has to do with built-in vs library. 
> > 
> > It makes a difference if you consider that each statement helps the
> > compiler understand what the perpose of a variable is. In the above
> > example 'length len' within the 'substr' statement allows the
> > compiler to understand that 'len' is being used to manipulate strings
> > in this fragment so it WOULD be able to help me catch a simple error
> > such as:
> > 
> > 	for (j=0; j<len; j++)
> > 	{
> > 		len2 = strlen(arr[j]);
> > 
> > 		arr2[j] = substr(arr[j], 0, len-2);
> > 	}
> 
> Be that as it may, but this is a difference between a specific C-style
> syntax and a specific BASIC-style syntax. There's nothing that would
> prevent a language to allow BASIC-style syntax for libraries (functions
> with several indentifiers that separate the arguments and together form
> one library call), so I don't really see the point WRT our discussion of
> built-in vs library.

I think I understand what you mean. But just to clarify: You are 
suggesting that instead of simply defining a function as:

	substr(char *, int);
	substr(char *, int, int);

That I could instead write:

	substr(char * 'FROM' int);
	substr(char * 'FROM' int 'TO' int);
	substr(char * 'FROM' int 'LENGTH' int);

Interesting.

But you'd still need to be able to attach a boat load of attributes to the 
function to give the compiler the same capabilities it would have if these 
functions were actually built-in language constructs and the compile times 
would be horrendous.

I think it would be do-able but you'd still need some kind of meta 
language to describe how these functions would interact with each other, 
their parameters and local variables that are used with them (from outside 
the call and not just as a parameter).

> 
> FWIW, my point was about specific function names. You didn't use my
> function names, so I don't know what you meant here. If you want to, can
> you rewrite this using substr_from_to and substr_from_len, then explain
> what is the difference to the BASIC variant?
> 
 	for (j=0; j<len; j++)
 	{
 		len2 = strlen(arr[j]);
 
 		arr2[j] = substr_from_len(arr[j], 0, len-2);
 	}

can you see that "len-2" should actually "len2"?

> 
> >>> I wonder if what you are really saying is that the compiler can do
> >>> more error checking and optimisation because it has all the source
> >>> rather than pre-compiled libraries? 
> >> 
> >> What I'm saying is that if it has the intent /and/ the source (the
> >> implementation), it can apply both for (usually different, and
> >> complementary) optimizations. Not that different from what it can do
> >> for built-in constructs.
> > 
> > Ok, but you really are talking about intrinsics as I understand them
> > with the addition of a standard library function for each intrinsic.
> 
> We probably need a common definition of "intrinsic function". I was
> talking about standard libraries that contain functions (and other
> language elements) that are defined in the language standard, in the
> same way as the language elements that the compiler implements. And
> additionally, to give the compiler access to the specific implementation
> (not only to the intent), the libraries are available to the compiler in
> source code. 
> 
> The intrinsic functions that are used e.g. in gcc and VC++ are not of
> this type. Most (if not all) are not even standard.

An intrinsic function is one that the compiler understands intimately. Not 
just what parameters it takes and what result it returns. A good example 
of an intrinsic function is the add operator ('+'). It seems to be used 
differently to a user defined function because you use it in expressions 
as an infix operator

e.g.
	A = B + C

	funcX(B + C)

But this is just for convenience. If you were able to define a symbol in 
C, say 'ADD', with all the attributes that the '+' symbol has then '+' and 
'ADD' would both behave the same way and the compiler would generate the 
same optimised code for both.

e.g.
	A = B ADD C

	funcX(B ADD C)

In C++ you can actually write your own add ('+') operator function.

> 
> 
> >>> I talk about the brick wall between the compiler and the libraries
> >>> and you respond with "make the source of the libraries available". 
> >> 
> >> No. I say consider that the library is a /standard/ library. Adding
> >> the source code is in addition, so that the compiler not only
> >> "knows" the intent, but also sees the implementation.
> > 
> > got you. intrinsic + library
> 
> I'm not so sure... :)  "Intrinsic" seems to imply (possibly among other
> things) that the library is written by the compiler vendor. I don't mean
> to imply this. 

No not necessarily. If the compiler vendor were explicit as to how an 
intrinsic function's counterpart in the library should be written then a 
third party could do the work. But it does seem like a lot of trouble to 
go to on the part of the compiler writer.

> 
> 
> >> Look at the C++ standard library definition for std::list::insert,
> >> for example. It contains a definition that allows each C++ compiler
> >> to "understand" what a call to std::list::insert is supposed to do.
> > 
> > I will look at this. Can you point me at a specific doc and library
> > so that I can be sure to look at exactly what you are looking at. 
> 
> Take any standardized language with a standardized library. For example,
> the C++ standard (the real one costs money, but here is a not quite up
> to date one <http://www.open-std.org/jtc1/sc22/open/n2356/>). And of
> course, anything that's missing /could/ be there -- it just isn't.
> 
> Or C# (ECMA-336) and the .NET CLI (ECMA-335), even though that's
> possibly a bit different. 

Ok I've had a look on the net and it's just template stuff. As I've said 
before, the compiler isn't really understanding intent here it's just 
mechanically churning out code and reducing it as much as possible. It 
gives the illusion that it understands what is going on because so much 
source code is being condensed into a small executable but the reallity is 
that all that source code is hand holding the compiler and telling it 
exactly what to generate.

Friendly Regards
Sergio Masci
-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist