On Thu, 16 Jul 2009, Gerhard Fiedler wrote:

> sergio masci wrote:
> 
> >>> It is now clear to me that you are talking about intrinsic
> >>> functions. Yes the function is defined in a /standard/ library but
> >>> the compiler also knows about the function independently of the
> >>> library. 
> >> 
> >> Would you call the C++ standard library functions (like
> >> std::list::insert) "intrinsic functions"? At least the meaning that
> >> this term seems to have in the Microsoft VC++ and gcc compiler
> >> documentation is not what I'm talking about.
> > 
> > No.
> 
> But this is what I'm talking about. So I was right with my suspicion
> that I wasn't talking about what you called "intrinsics".
> 
> 
> > home in on "'knows' the intent of the line"
> 
> This is what I'm trying to do. This is the reason why I want to
> understand what it is that makes a function that is defined by the
> standard (and is implemented in a library) different from a construct
> that is defined in the same standard and implemented inside the
> compiler. I haven't yet seen an example by you that I could understand
> -- and that wasn't about something else.

*** YOU *** are saying "'knows' the intent of the line".
*** I ***   am saying"'knows' the intent of several seperate lines as one 
unit"

You are happy to see:

	X = X + 0;

reduced to NOTHING (not a critisim). You have an expectation of this 
because the compiler understands the intent of the line.

I am saying something like:

	j = x * 2 + y * 2;

	temp = arr[j];

	j = (x + y) * 2;

	arr[j] = temp;

should also be reduced to nothing *** AND *** the compiler should warn the 
user that this piece of code has no effect and so may be in error. This is 
not possible if you understand only the intent of single lines but can be 
fudged by the compiler by keeping track of what is evaluated and where the 
result is placed. So although the compiler might be able to reduce it to 
zero it doesn't understand that this might be an error.

Things get much more difficult if we re-write the above as:

	int func_get(int arr[], int x, int y)
	{
		int	j;

		j = x * 2 + y * 2;

		return arr[j];
	}

	void func_set(int arr[], int x, int y, int val)
	{
		k = (x + y) * 2;

		arr[k] = val;
	}

	temp = func_get(arr, x, y);

	func_set(arr, x, y, temp);

Ok the compiler *** MIGHT *** still be able to hack this if it is very 
smart and all the source is available for the functions. BUT change the 
type of the array from a straight forward int to a struct and things get 
incredibly complicated (I mean they are really complicated now with int 
but they will get MUCH worse with a struct :)

Here having built-in type like STRING, LIST etc reduces the complication 
because the lanuage and compiler control the way they are used. It's like 
the difference between a menu interface and a command line interface. With 
the menu interface you guide the users interactions and there can be no 
unexpected commands issued that will break things (kind of).

> >> Right. My point was that if 8-bit strings are built-in and Unicode
> >> strings are in a library, and you are claiming (elsewhere) that the
> >> built-in syntax can be different from library syntax, then I need to
> >> make the Unicode string syntax completely different -- structurally
> >> different -- from 8-bit string syntax. Can you imagine that? What a
> >> pain.
> > 
> > Yes you are right it would be a pain. But it is a move in the right
> > direction.
> 
> I'm not sure. I think I wouldn't like it. For example, it seems that for
> some things Pascal-style strings are more efficient than C-style
> strings. There's nothing that prevents me from working with Pascal-style
> strings efficiently in C++ (and no matter whether ASCII, 8-bit with
> different codepages, different encodings of Unicode) -- in the same
> idiom that is used for standard strings in C++. This is because strings
> are /not/ built into the language (among other things). 
> 
> Being able to work with similar types in a similar way is important for
> code quality. If you need to use a different idiom for different string
> encodings, code quality goes down -- and code quality is important.

But the same argument could be made for floating point. Many people 
require different precission (wheather greater or less than that provided 
by C) yet they either use what is available in a way that suits them of 
they use a specialised library *** AS WELL *** as the supported floating 
point.

Often people will fit the solution to the tools available.

> 
> 
> > I think I understand what you mean. But just to clarify: You are 
> > suggesting that instead of simply defining a function as:
> > 
> > 	substr(char *, int);
> > 	substr(char *, int, int);
> > 
> > That I could instead write:
> > 
> > 	substr(char * 'FROM' int);
> > 	substr(char * 'FROM' int 'TO' int);
> > 	substr(char * 'FROM' int 'LENGTH' int);
> > 
> > Interesting.
> 
> That's one possible form, but it's not quite what I meant. What I meant
> is that you have a way to declare the function and syntax so that you
> actually can write
> 
>    SUBSTR str1 FROM pos1 TO end
> 
> and the compiler knows what library to call in which way. This doesn't
> look too complicated, and it's not actually that different from 
> 
>    SUBSTR_FROM_TO str1, pos1, end
> 
> Just a slightly different syntax.
> 
> > But you'd still need to be able to attach a boat load of attributes to
> > the function to give the compiler the same capabilities it would have
> > if these functions were actually built-in language constructs and the
> > compile times would be horrendous.
> 
> I don't understand why. This doesn't seem to be much more complicated to
> parse than a normal function call. It's a starting token SUBSTR,
> followed by five tokens that have to be three expressions interspersed
> with FROM and TO. 

It's not the parsing that's the problem. The problem is providing 
information to the compiler about the way these functions interact. I'm 
not talking about the calling protocol (the way parameters are evaluated 
and pased on the stack or the result is returned) I'm talking about how 
functions relate to each other over several lines of code.

Consider this:

	void process_substr(char *str, int pos1, int pos2)
	{
		char	*tmp_str;
		int	len;

		len = pos2 - pos1;

		tmp_str = malloc(len + 1);

		strncpy(tmp_str, (str + pos1), len);

		...
		...
		...

		free(tmp_str);
	}

Now turn this into a real example:

	char * string_alloc(int len)
	{
		char	*tmp_str;
		tmp_str = malloc(len + 1);

		return tmp_str;
	}

	void string_assign(char *dst_str, char *src_str)
	{
		strcpy(dst_str, src_str);
	}

	char * string_substr(char *str, int pos1, int pos2)
	{
	static	char	*own_str;
		int	len;

		len = pos2 - pos1;

		strncpy(own_str, (str + pos1), len);

		return own_str;
	}

	void string_release(char *str)
	{
		free(str);
	}

	void process_substr(char *str, int pos1, int pos2)
	{
		char	*tmp_str;
		int	len;

		len = pos2 - pos1;

		tmp_str = string_alloc(len + 1);

		string_assign(tmp_str, string_substr(str, (str + pos1), len);

		...
		...
		...

		string_release(tmp_str);
	}

in the above, how would I tell the compiler the relationship between 
string_alloc, string_assign, string_substr and string_release such that  
the compiler is able to:

(1) ... indicate an error if tmp_str is not initialised with string_alloc 
before string_assign is used on it (simply assigning a value to tmp_str is 
not what I mean, I mean actually using string_alloc to initialise it).

(2) ... indicate an error if tmp_str goes out of scope before it is 
cleaned up with string_release

(3) ... optimise the combined use of string_assign and string_substr

(4) ... recognise that tmp_str is undefined after the use of 
string_release

This is why you need to be able to add special attributes to the 
functions, to be able to tell the compiler about all this stuff.

And yes I know about C++ (I've been using it for many years) but creating 
a boat load of classes to *** TRY *** to do the same thing is not the 
same.

> 
> > I think it would be do-able but you'd still need some kind of meta
> > language to describe how these functions would interact with each
> > other, their parameters and local variables that are used with them
> > (from outside the call and not just as a parameter).
> 
> I'm not sure what you mean by "interaction". The interaction of
> functions is defined by the language standard that defines what they do.
> Their arguments would be described just as they are in any procedural
> language. You'd need a bit of a meta language to define such a
> construct, but not much I think. I don't think this goes much further
> than a normal function declaration; just add to the "<typename>
> <identifier>" pairs a possible pair "KEYWORD <identifier>".
> 

One of the reasons we have a simplified common protocol that allows any 
function to be called from anywhere in the program is to eliminated the 
complexity that would otherwise come about by trying to juggle the actual 
prameters used in-line (where the function is called) with the formal 
parameters (how the parameters are defined within the function).

Consider this

	int func_A(float x)
	{
	...
	}

	int	val_1;

	unsigned char
		val_2;

	float	val_3;


	func_A(val_1);

	func_A(val_2);

	func_A(val_3);

in each of these cases C ensures that the actual paraemeters val_1, val_2 
and val_3 are floats before passing them to func_A. If they are not 
already floats, it promotes them. Furthermore actual the parameter (val_1, 
val_2 and val_3) is copied (promoted first if necessary) to a place where 
the function expects it to be when it is called. The function does not 
operate on val_1, val_2 or val_3 directly. Something similar happens for 
the return value. All of this is the calling protocol.

So yes the the compiler "understands" a standard library function in as 
much as it "understands" its parameter requirements. Other than ensuring 
that actual parameters are promoted to the correct type and copied to the 
correct place the compiler does very little else across the function 
boundry. Some compilers are much more intelligent than others and try to 
see through (or even punch through) the function boundry but there again 
there is only so much that the compiler can do because of the complexity 
of the function. Regardless it is still very difficult for the compiler to
combine information across the function boundry because the function needs 
to stick to the calling protocol if it is to be used elsewhere.

Inline functions reduce the calling protocol burden because it becomes 
possible to tailor each instance of the called function to the place where 
it is called. Howver there is only so much you can do with this because 
these inline functions may themselves call other system functions which 
are not inline. AND this doesn't help with the "understanding" of the 
relationships between functions (see the string_assign example I gave 
above). For this you still need some way of adding specialised attributes 
to a function which tell the compiler about these relationships.

C++ templates give the illusion that the compiler really understands 
what's going on - it still doesn't. What's happening is that a lot of 
stuff gets expanded "inline" (not just functions). So the compiler is able 
to do lots of optimisations. For example it might see that an int is being 
promoted to a float so that it can be passed to a function (method) which 
is then converting it to an int to pass it to another function (method). 
The compiler might then simply allow the promoting / converting to cancel 
each other and use the original. This looks to the user as if the compiler 
understood what is going on. It doesn't. It no more understands than a 
calculator does when you press the -/+ key twice to get to the original 
value.

The C++ compiler still doesn't understand the relationships between 
functions. The programmer does and he arranges all the objects to get 
sensible optimised results.

> 
> > An intrinsic function is one that the compiler understands intimately.
> > Not just what parameters it takes and what result it returns. 
> 
> Next step is to define what "understands intimately" means. Does a
> specification (like in a language standard) qualify?
> 

No.

> > In C++ you can actually write your own add ('+') operator function.
> 
> Which would be different from the standard operator+() functions, in
> that the compiler doesn't "know" what they are supposed to be doing and
> has to treat them as normal function calls (in the case of C++ with the
> special rules that the language standard defines for operator+(), of
> course).
> 
> 
> >> Take any standardized language with a standardized library. For
> >> example, the C++ standard (the real one costs money, but here is a
> >> not quite up to date one <http://www.open-std.org/jtc1/sc22/open/n2356/>).
> >> And of course, anything that's missing /could/ be there -- it just
> >> isn't. 
> >> 
> >> Or C# (ECMA-336) and the .NET CLI (ECMA-335), even though that's
> >> possibly a bit different. 
> > 
> > Ok I've had a look on the net and it's just template stuff. As I've
> > said before, the compiler isn't really understanding intent here it's
> > just mechanically churning out code and reducing it as much as
> > possible. It gives the illusion that it understands what is going on
> > because so much source code is being condensed into a small
> > executable but the reallity is that all that source code is hand
> > holding the compiler and telling it exactly what to generate. 
> 
> Not sure whether the standard can tell you what kind of optimizations
> actual compilers implement. Much of this is probably a trade secret.
> Then I'm not convinced a casual look at this is enough to find out what
> /could/ be implemented. They don't talk about optimizations in the
> standard; they just say what has to be the result. 
> 
> Going back to one of your original arguments, the one that prompted my
> question about what the difference is between list operations that are
> implemented by the compiler and list operations that are implemented in
> a standard library: the substitution of a delete from a list followed by
> an insert into a list at the same location by a replace. 
> 
> If you want to implement such an optimization in your compiler, what's
> the difference between having these functions as part of the compiler,
> or having them defined as standard library functions in the standard
> that the compiler is based upon? In both cases, the compiler "knows"
> that this substitution is possible and the exact implementation of the
> three functions is not necessary to be known for this optimization to be
> possible.

Yes I understand what you mean here. But in the case of functions this 
relies on the compiler actually recognising the function names and 
executing some internal code to perform checks, issue warning (or errors) 
and perform optimisations.

That internal compiler code needs to be activated somehow and it needs to 
perform a very specific task (not something that would be there anyway - 
somthing that is written specially for that one perpose).

If you really wanted to you could say that the function names are 
undefined to the compiler but then you need some way of telling the 
compiler which functions are special (what their names are) and what is 
special about them (under special circustances you can replace func_A + 
func_B with func_C). This is why you would need to be able to add special 
attributes to a function, so that you can tell the compiler that these are 
special functions that need to be handled in a special way (execute the 
special internal code for them). (sorry about all the specials :-)

Other than having special attributes you really would need to make these 
functions intrinsic (known and understood by the compiler) so that it can 
execute its special internal code when it sees them in the users source 
code.

Friendly Regards
Sergio Masci

-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist