On Fri, 24 Jul 2009, Gerhard Fiedler wrote:

> sergio masci wrote:
> 
> >> I suspect that in many compilers (3) and (2) end up (as an
> >> intermediate representation) in something that's equivalent to (1).
> >> But even if not, it shouldn't be too difficult to make all three end
> >> up in the same, whatever the compiler's internal representation of
> >> this statement is. 
> > 
> > Ok so I write a seperate parser for all three and internally they all 
> > produce:
> > 
> > 	if_statement
> > 	.. expr
> > 	.. .. lt
> > 	.. .. .. x
> > 	.. .. .. 0
> > 	.. statement
> > 	.. .. assign
> > 	.. .. .. y
> > 	.. .. .. 0
> > 	.. statement
> > 	.. .. assign
> > 	.. .. .. y
> > 	.. .. .. x
> > 
> > So what advantage does this give me?
> 
> I don't know, but if you do it, you probably know :)  Seriously, I don't
> understand the question in this context.

Ok, I understand now, you're talking about going backwards (everything 
towards the type (1) syntax) whereas I was saying something else - see 
below.

> 
> > If instead of the keyword 'if' I used a function 'xyz' thus:
> > 
> > 	xyz( lt(x, 0), assign(y, 0), assign(y, x) )
> > 
> > my parsers would all now produce:
> > 
> > 	expr
> > 	.. func
> > 	.. .. xyz
> > 	.. .. expr
> > 	.. .. .. lt
> > 	.. .. .. .. x
> > 	.. .. .. .. 0
> > 	.. .. expr
> > 	.. .. .. assign
> > 	.. .. .. .. y
> > 	.. .. .. .. 0
> > 	.. .. expr
> > 	.. .. .. assign
> > 	.. .. .. .. y
> > 	.. .. .. .. x
> > 
> > Internally the compiler has different sections that deal with
> > generating code for 'if_statment' trees, 'statement' trees and 'expr'
> > trees. 
> > 
> > How would you propose that I treat 'xyz' differently while parsing
> > and how would you add a specific code generator for the now special
> > 'xyz' (you need to describe all this somehow)?
> 
> 'if' is a special statement/construct/function, defined in the language
> standard. 'xyz' is not. Therefore, the compiler can have (and generally
> has) special code to generate 'if' more efficiently than a function
> call.
> 
> Compared to the original issue -- lists --, 'if' is much more simple,
> and the function call overhead here is important and more than the
> actual functionality typically would be. That's why it generally makes
> sense to implement such a construct directly, avoiding the function call
> overhead. 

But it's not just the function call overhead that's important here.

If we were to implement the 'if' statement as a function the calling 
protocol would be completely different to that of a normal function 
regardless of any special optimisations.

In the 'if' statement, either the 'true' or 'false' statement is executed 
only *** AFTER *** the 'if' condition is evaluated. In the 'if' function, 
the condition and both the 'true' and 'false' statements (all 3 parameters 
of the 'if' function) are evaluated before the 'if' function is called.

Trying to say that you can achine the same functionality by extending the 
language using functions as you can by adding special keywords does not 
hold. You'd have to say "some functions are called like, some are called 
like that, some are called in bizzaro mode N etc etc etc..." The whole 
thing would become a mess and you've been forced to change the way the 
language works anyway :)

I am trying to show that internally there is a fundamental difference 
between the way statements are parsed and code is generated for them and 
functions are parsed and code is generated for them. Just because you 
parse both and generate trees for both it doesn't mean that the trees are 
the same.

Ok, so I'll go along with you on this and I'll trap certain functions in 
my 'new' compiler so that they give you the result you want (delayed 
evaluation of parameters). But can you not see that the amount of work 
going into the compiler proper (by the compiler writer) is huge. The 
language looks like it's being extended by adding functions but these 
functions have a large component buried inside the compiler. So these 
functions (in the /standard/ library) apart from having the illusion of 
being external to the compiler are now fundamental to it and could cause 
the compiler to generate bad code if they are touched.

The only thing I can see that you might gain from this is better 
optimisation of old code.

> 
> I'm not a compiler specialist, but it could be that 'avoiding the
> function call overhead' is a premature optimization, and that a later,
> lower-level optimization could result in just this. In any case, since
> 'if' is defined in the language standard, there's nothing that would
> prevent a compiler writer to implement it in the compiler.

Yes but the point isn't about making the 'if' function behave like an 'if' 
statement it's about how do make other functions also behave in a special 
way without doing a lot of work in the compiler to treat each function as 
a special case.

If you're going to do all this extra work to the compiler don't kid 
yourself that the /standard/ library is actually helping the compiler 
writer - it isn't.

The /standard/ library only came into being to help the user get around 
shortcomings in the language and it's only helping the compiler writer if 
he/she doesn't need to do anything special with it.

> 
> > the 'if_statement' code generator knows that it might have either one
> > or two statements following the condition expression. It also knows
> > that if the condition expression evaluates to a compile time constant
> > that it can discard either the 'true' or 'false' statements. It also
> > needs to interact with the code generator for the condition
> > expression to allow that generator to produce efficient optimised
> > code jumps to the 'true' or 'false' statements (think of early out
> > logical expressions involving '&&' and '||'). Consider the difference
> > between "if (cond)..." and "X=cond" 
> 
> Yes. This all is pretty much C. I thought we weren't really talking
> about any specific languages, but about 'implemented in the compiler'
> versus 'implemented in a library'. 

I'm not talking specificaly about C that was just an example so people 
could relate. Other languages provide early out logical expressions and 
the ability to assign the result of a condition to a variable. It wouldn't 
have helped much if I'd written the example in ALGOL. So I'm not pointing 
the finger at C and saying "see C is bad", I'm saying "look when you 
evaluate a logical expression there's more to it than just a simple is 
'x==y' and you need to be able to cope with all kinds of nasty stuff"

> 
> The 'if' statement implementation is so short that going through a
> general-purpose function call convention would blow up the code
> tremendously. But:
> 1) Nobody says that the compiler writer /has/ to do this. The 'if'
> statement is defined in the language standard, and the compiler writer
> can choose to implement it directly in the compiler.
> 2) Even if the compiler writer chooses to implement it as function call,
> I think it is possible that a lower-level optimization detects the
> inefficiencies and successfully optimizes the function call away
> (remember that I considered that the function is available as source),
> reaching the same code as if implemented directly.

I really don't think you would be able to have an 'if' function as source. 
How would you describe the function to the compiler if you have no 
built-in conditional statement? Would you resort to assembler? How would 
you end up optimising this? Would your compiler need a built-in simulator 
to be able to run the code internally? I mean if you have a cross compiler 
that runs on a PC and it generates code for a PIC would you need a PIC 
simulator inside your compiler to be able to execute the assembler for the 
'if' function?


> 
> >> So, yes, I think for a compiler these three could be identical. I
> >> don't see that a compiler could derive any information from any of
> >> them that it couldn't derive from the other two.
> > 
> > Actually it can. Consider a long complex program made up purely of
> > functions as in (1). What happens with a misplaced comma or
> > parenthesis? 
> 
> That's a feature of that specific syntax, not a difference whether the
> 'if' function is implemented in the compiler or in a library. We didn't
> discuss the various merits of the different syntaxes (sp ?? :)
> 
> > The verbose syntax lets the compiler catch silly mistakes. 
> 
> Of course. The more redundant (that is, verbose) the syntax is, the
> easier it is both for the programmer to get something wrong and for the
> compiler to catch when something is wrong. But we didn't discuss the
> merits of different syntaxes, we discussed merits of 'implemented in the
> compiler' versus 'implemented in a standard library'.

But one of the points I've been making all along is having the compiler 
identify errors.

> 
> >> Provided, of course, that the functions used in (1) are just as
> >> defined as the operators and statements used in (2) and (3).
> > 
> > Ok so I'll give you that, all the functions are defined in exactly
> > the same way in (1) as they are in (2) and (3). But what are we going
> > to do about the vast number of functions defined in the /standard/
> > library?
> 
> I don't understand the question. I didn't mean to suggest that all
> functions need to have an equivalent in the forms (2) or (3), but rather
> the other way round, that typically constructs of the forms (2) or (3)
> have an equivalent function call syntax that does the same
> (functionally, not necessarily in terms of typing or user-friendliness).
> 

Ok so if I understand you correctly what you are saying is "if any program 
written in syntax "X" can be converted to the equivalent as if it had 
actually been written in syntax (1) (the function call syntax), then the 
function call syntax is powerful enough to do anything that a much more 
complex syntax would allow". This is what you are saying right?

Ok assuming you are. The problem with this is the intelligence needed 
within the compiler to understand the intent of the programmer. This is 
what I've been pointing at all along. Ok so I haven't gotten my point 
across.

Look, it's like this: any (correct) high level language program can be 
converted by a compiler into a machine code executable (by definition 
that's what compilers do), but taking an executable and going the other 
(using a program to convert it to high level language source) is very 
difficult.

The higher the level of the language you want the "de-compiler" to convert 
to the harder it gets. The ultimate would be to say "this decompiler 
explains in common english what this executable does (a documentor?)".

If you use a higher level language than C (with built-in STRINGS, LISTS, 
STACKS, MULTITASKING etc) you are starting from higher level and therefor 
impart greater "intent" into your program.

It's like me saying "turn on lamp commected to relay 6" and the compiler 
saying "wait a minute you don't have a lamp on relay 6 you have freezer 
connected to that relay". As opposed to saying "PORTB |= 1 << 3".

As I tried to show in an earlier post (where I showed many different ways 
to append the ascii representation of a number to a string), trying to 
understand the intent of a programmer by looking at lots of function calls 
is *** VERY ***  difficult.

> 
> >>> AND because the compiler should be able to "understand" functions as
> >>> easily as other language statements that it is a convenient way to
> >>> extend the language. 
> >> 
> >> Yes, extend or customize. That's approximately the C++ standard
> >> library way. 
> > 
> > But the C++ way is horrible! You have CLASS upon CLASS upon CLASS. If
> > you want to write a modest program you end up so deep in 'standard'
> > classes and templates that it gets very hard to see the wood for the
> > trees.
> 
> This is not about a specific implementation of the principle, this is
> about the principle. You always bring in C, despite (or because?) we
> already agreed that the C way is pretty much horrible. And we probably
> can agree that the BASIC way is horrible, too -- some exceptions
> notwithstanding :)
> 
> > This nonsense that user classes should be written in such a way as to
> > have special methods that the standard libraries expect (things like
> > iterators) so that the items in a container can be accessed. The
> > programmer shouldn't need to know about all this. He should be able
> > to just say (e.g.)
> > 
> > 	for all items in list FRED do
> > 
> > 		*.x = $.x + 1
> > 	done
> 
> I don't really understand this. This is probably your syntax, and quite
> familiar to you, but I don't think a majority of list readers here would
> know what this does. In any case, I don't.

No, this is not familiar to me at all. I made it up on the spur of the 
moment. It propbably doesn't make sense because

	*.x = $.x + 1

should read

	$.x = $.x + 1

:-)

> 
> Anyway, for what it's worth, and independently of the issue we're
> discussing (compiler-built-in vs standard-library-implemented), my C++
> code looks similar:
> 
> BOOST_FOREACH( item i, FRED ) {
>   ++i.x;
> }

There you go then, you could have had a reasonable go at understanding 
what I wrote :)

> 
> (If this is what your code does... since I don't know what it does, I
> can't really tell, but it probably is trivial to correct it if it's
> doing something different. And I don't generally use identifiers like
> FRED for lists in C++, but that's only a style question.)
> 
> But again, this is not a discussion of C++ style syntax versus BASIC
> style syntax (yet :) -- and I don't see anything particularly
> advantageous about the C++ style syntax. But all of C++'s list handling
> is implemented in libraries -- and that's the issue. 
> 
> (I used here an element from the Boost library. It's not a standard
> library, but it could be one. Whether or not a given library is a
> standard library is just a matter of definition, and not of principle.)

You keep refering back to BASIC. Do you think I'm pushing BASIC here? I'm 
thinking about other things here like SQL, COBOL, ALGOL ICON (you should 
really look at ICON - lovely language, lots of built in types).

Friendly Regards
Sergio Masci
-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist