On Thu, 16 Jul 2009, Gerhard Fiedler wrote: > sergio masci wrote: > > >>> It is now clear to me that you are talking about intrinsic > >>> functions. Yes the function is defined in a /standard/ library but > >>> the compiler also knows about the function independently of the > >>> library. > >> > >> Would you call the C++ standard library functions (like > >> std::list::insert) "intrinsic functions"? At least the meaning that > >> this term seems to have in the Microsoft VC++ and gcc compiler > >> documentation is not what I'm talking about. > > > > No. > > But this is what I'm talking about. So I was right with my suspicion > that I wasn't talking about what you called "intrinsics". > > > > home in on "'knows' the intent of the line" > > This is what I'm trying to do. This is the reason why I want to > understand what it is that makes a function that is defined by the > standard (and is implemented in a library) different from a construct > that is defined in the same standard and implemented inside the > compiler. I haven't yet seen an example by you that I could understand > -- and that wasn't about something else. *** YOU *** are saying "'knows' the intent of the line". *** I *** am saying"'knows' the intent of several seperate lines as one unit" You are happy to see: X = X + 0; reduced to NOTHING (not a critisim). You have an expectation of this because the compiler understands the intent of the line. I am saying something like: j = x * 2 + y * 2; temp = arr[j]; j = (x + y) * 2; arr[j] = temp; should also be reduced to nothing *** AND *** the compiler should warn the user that this piece of code has no effect and so may be in error. This is not possible if you understand only the intent of single lines but can be fudged by the compiler by keeping track of what is evaluated and where the result is placed. So although the compiler might be able to reduce it to zero it doesn't understand that this might be an error. Things get much more difficult if we re-write the above as: int func_get(int arr[], int x, int y) { int j; j = x * 2 + y * 2; return arr[j]; } void func_set(int arr[], int x, int y, int val) { k = (x + y) * 2; arr[k] = val; } temp = func_get(arr, x, y); func_set(arr, x, y, temp); Ok the compiler *** MIGHT *** still be able to hack this if it is very smart and all the source is available for the functions. BUT change the type of the array from a straight forward int to a struct and things get incredibly complicated (I mean they are really complicated now with int but they will get MUCH worse with a struct :) Here having built-in type like STRING, LIST etc reduces the complication because the lanuage and compiler control the way they are used. It's like the difference between a menu interface and a command line interface. With the menu interface you guide the users interactions and there can be no unexpected commands issued that will break things (kind of). > >> Right. My point was that if 8-bit strings are built-in and Unicode > >> strings are in a library, and you are claiming (elsewhere) that the > >> built-in syntax can be different from library syntax, then I need to > >> make the Unicode string syntax completely different -- structurally > >> different -- from 8-bit string syntax. Can you imagine that? What a > >> pain. > > > > Yes you are right it would be a pain. But it is a move in the right > > direction. > > I'm not sure. I think I wouldn't like it. For example, it seems that for > some things Pascal-style strings are more efficient than C-style > strings. There's nothing that prevents me from working with Pascal-style > strings efficiently in C++ (and no matter whether ASCII, 8-bit with > different codepages, different encodings of Unicode) -- in the same > idiom that is used for standard strings in C++. This is because strings > are /not/ built into the language (among other things). > > Being able to work with similar types in a similar way is important for > code quality. If you need to use a different idiom for different string > encodings, code quality goes down -- and code quality is important. But the same argument could be made for floating point. Many people require different precission (wheather greater or less than that provided by C) yet they either use what is available in a way that suits them of they use a specialised library *** AS WELL *** as the supported floating point. Often people will fit the solution to the tools available. > > > > I think I understand what you mean. But just to clarify: You are > > suggesting that instead of simply defining a function as: > > > > substr(char *, int); > > substr(char *, int, int); > > > > That I could instead write: > > > > substr(char * 'FROM' int); > > substr(char * 'FROM' int 'TO' int); > > substr(char * 'FROM' int 'LENGTH' int); > > > > Interesting. > > That's one possible form, but it's not quite what I meant. What I meant > is that you have a way to declare the function and syntax so that you > actually can write > > SUBSTR str1 FROM pos1 TO end > > and the compiler knows what library to call in which way. This doesn't > look too complicated, and it's not actually that different from > > SUBSTR_FROM_TO str1, pos1, end > > Just a slightly different syntax. > > > But you'd still need to be able to attach a boat load of attributes to > > the function to give the compiler the same capabilities it would have > > if these functions were actually built-in language constructs and the > > compile times would be horrendous. > > I don't understand why. This doesn't seem to be much more complicated to > parse than a normal function call. It's a starting token SUBSTR, > followed by five tokens that have to be three expressions interspersed > with FROM and TO. It's not the parsing that's the problem. The problem is providing information to the compiler about the way these functions interact. I'm not talking about the calling protocol (the way parameters are evaluated and pased on the stack or the result is returned) I'm talking about how functions relate to each other over several lines of code. Consider this: void process_substr(char *str, int pos1, int pos2) { char *tmp_str; int len; len = pos2 - pos1; tmp_str = malloc(len + 1); strncpy(tmp_str, (str + pos1), len); ... ... ... free(tmp_str); } Now turn this into a real example: char * string_alloc(int len) { char *tmp_str; tmp_str = malloc(len + 1); return tmp_str; } void string_assign(char *dst_str, char *src_str) { strcpy(dst_str, src_str); } char * string_substr(char *str, int pos1, int pos2) { static char *own_str; int len; len = pos2 - pos1; strncpy(own_str, (str + pos1), len); return own_str; } void string_release(char *str) { free(str); } void process_substr(char *str, int pos1, int pos2) { char *tmp_str; int len; len = pos2 - pos1; tmp_str = string_alloc(len + 1); string_assign(tmp_str, string_substr(str, (str + pos1), len); ... ... ... string_release(tmp_str); } in the above, how would I tell the compiler the relationship between string_alloc, string_assign, string_substr and string_release such that the compiler is able to: (1) ... indicate an error if tmp_str is not initialised with string_alloc before string_assign is used on it (simply assigning a value to tmp_str is not what I mean, I mean actually using string_alloc to initialise it). (2) ... indicate an error if tmp_str goes out of scope before it is cleaned up with string_release (3) ... optimise the combined use of string_assign and string_substr (4) ... recognise that tmp_str is undefined after the use of string_release This is why you need to be able to add special attributes to the functions, to be able to tell the compiler about all this stuff. And yes I know about C++ (I've been using it for many years) but creating a boat load of classes to *** TRY *** to do the same thing is not the same. > > > I think it would be do-able but you'd still need some kind of meta > > language to describe how these functions would interact with each > > other, their parameters and local variables that are used with them > > (from outside the call and not just as a parameter). > > I'm not sure what you mean by "interaction". The interaction of > functions is defined by the language standard that defines what they do. > Their arguments would be described just as they are in any procedural > language. You'd need a bit of a meta language to define such a > construct, but not much I think. I don't think this goes much further > than a normal function declaration; just add to the " > " pairs a possible pair "KEYWORD ". > One of the reasons we have a simplified common protocol that allows any function to be called from anywhere in the program is to eliminated the complexity that would otherwise come about by trying to juggle the actual prameters used in-line (where the function is called) with the formal parameters (how the parameters are defined within the function). Consider this int func_A(float x) { ... } int val_1; unsigned char val_2; float val_3; func_A(val_1); func_A(val_2); func_A(val_3); in each of these cases C ensures that the actual paraemeters val_1, val_2 and val_3 are floats before passing them to func_A. If they are not already floats, it promotes them. Furthermore actual the parameter (val_1, val_2 and val_3) is copied (promoted first if necessary) to a place where the function expects it to be when it is called. The function does not operate on val_1, val_2 or val_3 directly. Something similar happens for the return value. All of this is the calling protocol. So yes the the compiler "understands" a standard library function in as much as it "understands" its parameter requirements. Other than ensuring that actual parameters are promoted to the correct type and copied to the correct place the compiler does very little else across the function boundry. Some compilers are much more intelligent than others and try to see through (or even punch through) the function boundry but there again there is only so much that the compiler can do because of the complexity of the function. Regardless it is still very difficult for the compiler to combine information across the function boundry because the function needs to stick to the calling protocol if it is to be used elsewhere. Inline functions reduce the calling protocol burden because it becomes possible to tailor each instance of the called function to the place where it is called. Howver there is only so much you can do with this because these inline functions may themselves call other system functions which are not inline. AND this doesn't help with the "understanding" of the relationships between functions (see the string_assign example I gave above). For this you still need some way of adding specialised attributes to a function which tell the compiler about these relationships. C++ templates give the illusion that the compiler really understands what's going on - it still doesn't. What's happening is that a lot of stuff gets expanded "inline" (not just functions). So the compiler is able to do lots of optimisations. For example it might see that an int is being promoted to a float so that it can be passed to a function (method) which is then converting it to an int to pass it to another function (method). The compiler might then simply allow the promoting / converting to cancel each other and use the original. This looks to the user as if the compiler understood what is going on. It doesn't. It no more understands than a calculator does when you press the -/+ key twice to get to the original value. The C++ compiler still doesn't understand the relationships between functions. The programmer does and he arranges all the objects to get sensible optimised results. > > > An intrinsic function is one that the compiler understands intimately. > > Not just what parameters it takes and what result it returns. > > Next step is to define what "understands intimately" means. Does a > specification (like in a language standard) qualify? > No. > > In C++ you can actually write your own add ('+') operator function. > > Which would be different from the standard operator+() functions, in > that the compiler doesn't "know" what they are supposed to be doing and > has to treat them as normal function calls (in the case of C++ with the > special rules that the language standard defines for operator+(), of > course). > > > >> Take any standardized language with a standardized library. For > >> example, the C++ standard (the real one costs money, but here is a > >> not quite up to date one ). > >> And of course, anything that's missing /could/ be there -- it just > >> isn't. > >> > >> Or C# (ECMA-336) and the .NET CLI (ECMA-335), even though that's > >> possibly a bit different. > > > > Ok I've had a look on the net and it's just template stuff. As I've > > said before, the compiler isn't really understanding intent here it's > > just mechanically churning out code and reducing it as much as > > possible. It gives the illusion that it understands what is going on > > because so much source code is being condensed into a small > > executable but the reallity is that all that source code is hand > > holding the compiler and telling it exactly what to generate. > > Not sure whether the standard can tell you what kind of optimizations > actual compilers implement. Much of this is probably a trade secret. > Then I'm not convinced a casual look at this is enough to find out what > /could/ be implemented. They don't talk about optimizations in the > standard; they just say what has to be the result. > > Going back to one of your original arguments, the one that prompted my > question about what the difference is between list operations that are > implemented by the compiler and list operations that are implemented in > a standard library: the substitution of a delete from a list followed by > an insert into a list at the same location by a replace. > > If you want to implement such an optimization in your compiler, what's > the difference between having these functions as part of the compiler, > or having them defined as standard library functions in the standard > that the compiler is based upon? In both cases, the compiler "knows" > that this substitution is possible and the exact implementation of the > three functions is not necessary to be known for this optimization to be > possible. Yes I understand what you mean here. But in the case of functions this relies on the compiler actually recognising the function names and executing some internal code to perform checks, issue warning (or errors) and perform optimisations. That internal compiler code needs to be activated somehow and it needs to perform a very specific task (not something that would be there anyway - somthing that is written specially for that one perpose). If you really wanted to you could say that the function names are undefined to the compiler but then you need some way of telling the compiler which functions are special (what their names are) and what is special about them (under special circustances you can replace func_A + func_B with func_C). This is why you would need to be able to add special attributes to a function, so that you can tell the compiler that these are special functions that need to be handled in a special way (execute the special internal code for them). (sorry about all the specials :-) Other than having special attributes you really would need to make these functions intrinsic (known and understood by the compiler) so that it can execute its special internal code when it sees them in the users source code. Friendly Regards Sergio Masci -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist