On Fri, 14 Mar 2008, Tamas Rudnai wrote:

> Sergio,
> 
> On the 'XCSB samples and circuits' section, the led-01.bas is _not_ compiled
> with code optimization I suppose? (It's not a new flame, I am just curious
> about how compact code can be generated by an HLL, how could it be compared
> to what I would write in asm).
> 
> Many thanks,
> Tamas

Hi Tamas,

Yes it is compiled with code optimisation as standard. But the example you 
see is compiled with a much older version of XCSB than is available today.
In this example you find the program startup (present in startup.asm), the 
interrupt service prolog and epilog code and the heartbeat timer code (all 
of which is also present in startup.asm), 32 bit interger addition and 
comparison code (used by the big delays).

The few lines of code given in this example would probably compile to more 
or less what you might write in assembler. You may also find that a really 
good C compiler will produce about the same (for this example). However a 
cheap and chearful C compiler will generate much more code and it will be 
much less efficient (take longer to execute). The same is true of many 
BASIC compilers including the expensive ones.

The real strength of a HLL lies in compiling big programs (certainly much 
bigger than the simple example shown). With big programs (ASM and HLL) you 
tend to break them down into subroutines (or functions). Each subroutine 
would typically need input parameters, local variables and return results.

To make ASM subroutines better that stright forward inline ASM you tend to 
try to make them reuseable within the same code. So you end up defining a 
protocol to pass info between the subroutine and the caller. You might 
decide to pass a value to the subroutine by loading it into W and return 
the result again in W. This is OK for an 8 bit value, but what about 16 
bit values or multiple parameters. You would probably end up reserving 
some RAM locations to pass the parameters and you might need to do the 
same thing when you return the result because it wont fit in W. Another 
complication is that you might need to use W to calculate the address of 
the subroutine you are calling (maybe setup PCLATH).

To clarify: in one place you might want to pass the value of a variable 
as the parameter to your subroutine, in another place you might want to pass a 
constant, in yet another place you might want to pass the value of an 
element of an array. So the obvious solution is to define a general 
purpose way of passing the parameter through a RAM location.

e.g. (1)
	movlw	(MAX_DLY >> 8) & 0xff
	movwf	arg0+1
	movlw	MAX_DLY & 0xff
	movwf	arg0+0

	call	wait

e.g. (2)
	movf	max_dly+1,w
	movwf	arg0+1
	movf	max_dly+0,w
	movwf	arg0+0

	call	wait


This has two serious implications:
(1) it takes time and instructions to setup the parameter before you call 
the subroutine
(2) the place where you store the parameter must be safe (you cannot put 
anything else there until the called subroutine has finished with it)

A HLL compiler can do all this for you automatically. It will keep track 
of the protocol and generate all the necessary code AND it will keep track 
of memory locations used for passing those parameters and reuse them for
other subroutines only when it is safe.


NOW COMES THE MAGIC!!!

As you develop your code, you will notice that sometimes a subroutine that 
you have written, only ever takes a certain variable or constant as a 
parameter. So you go back and massage the subroutine so that it no longer 
needs that parameter but instead you EMBED the constant or variable 
directly into the code of the subroutine. Then you might notice that your 
subroutine is doing a lot less work because it no longer needs to play 
around with extra RAM access, or the pointer you were passing does not 
need to be dereferenced any longer but you can access the memory directly. 
So your code gets tighter and tighter until you notice that it's 
reduced to only a few instructions. At this point you might decide to turn 
it into a macro. Now imagine a compiler that does all that checking 
for you each and every time you make a change.


IT GETS BETTER!!!

The HLL might also be able to optimise the way the result is generated so 
that instead of passing the result back through some reserved RAM it 
actually computes the result in situ: in the variable where the result 
needs to be stored.

e.g.
	proc int sum(int a, int b, int c)
		return a + b + c
	endproc

	x = sum(a, b, c)

    generates the same code as:

	x = a + b + c

    without any call, return or copy overhead

    and this goes even further:

	proc int diff(j, k)
		return j - k
	endproc

	x = sum(a, diff(b, c), d)

    generates the same code as:

	x = a + (b - c) + d


TIP OF THE ICEBERG!!!

Now imagine your assembler source code in front of you. You should be able 
to break this down into groups of instructions. Within each group you will 
have a protocol - the way information is passing from instruction to 
instruction. You will concentrate on optimising each group so that

e.g. (1)
	movlw	0x80
	iorf	fred

    becomes

	bsf	fred,7

e.g. (2)
	movlw	(0x1000000 >> 24) & 0xff
	movwf	fred+3
	movlw	(0x1000000 >> 16) & 0xff
	movwf	fred+2
	movlw	(0x1000000 >> 8) & 0xff
	movwf	fred+1
	movlw	0x1000000 & 0xff
	movwf	fred+0

    becomes

	movlw	(0x1000000 >> 24) & 0xff
	movwf	fred+3
	clrf	fred+2
	clrf	fred+1
	clrf	fred+0

and the compiler will do the same as you. But the compiler will also check 
obscure things that would probably escape your notice like

e.g.
	MDF	.equ	16777216

	movlw	(MDF >> 24) & 0xff
	movwf	fred+3
	movlw	(MDF >> 16) & 0xff
	movwf	fred+2
	movlw	(MDF >> 8) & 0xff
	movwf	fred+1
	movlw	MDF & 0xff
	movwf	fred+0

   here the compiler would notice that MDF is actually 0x1000000 and use 
   clrf instead

Ok so this intra-group-protocol thing may seem like a big "so what" but it 
has real benefits when you start looking at the size of variables. 
Consider adding an 8 bit unsigned variable (jack) to a 16 bit variable 
(fred). You can either convert the 8 bit value (jack) to a 16 bit value 
(jack2) and then add it (to fred). This is a simple one-size-fits-all 
solution

e.g.
	movf	jack,w
	movwf	jack2+0
	clrf	jack2+1

	movf	jack2+0,w
	addwf	fred+0
	btfsc	STATUS,C
	incf	fred+1
	movf	jack2+1,w
	addwf	fred+1

or you can go the extra mile and generate the following optimised code

	movf	jack,w
	addwf	fred+0
	btfsc	STATUS,C
	incf	fred+1

Now we have three protocols, one that deals with 8 bit to 8 bit, one that 
deals with 16 bit to 16 bit and one that deals with 8 bit to 16 bit.

8 bit to 8 bit and 16 bit to 16 bit is easy to do in assembler using 
macros but the more combinations you have to deal with with assembler the 
easier it is to get it wrong (cause a bug) because the assembler cannot 
keep track of the size of a variable and use the appropriote macro - the 
assembler programmer has to do that.

Ok, only 3 protocols, not a big deal you might argue. What about adding 
some more useful ones then:

	8 bit to 32 bit
	16 bit to 32 bit
	32 bit to 32 bit

then there are also the constant variations:

	8 bit constant to 8 bit
	16 bit constant to 16 bit
	32 bit constant to 32 bit
	8 bit constant to 16 bit
	16 bit constant to 32 bit

and then there are the variations where W holds the least sig 8 bits of 
the value, and variations dealing with the value being pointed to by FSR.

The point is that there are lots of different intra-group-protocols to 
choose from in order to generate optimised code. Just defining a few 
macros wont cut it.

You will also have a protocol between groups of instructions so that 
(e.g.) one group leaves something ready for another group to process, or 
(e.g.) a loop counter is setup and at the start of a loop and used in the 
loop and decremented at the end of the loop.

The compiler is also performing optimisations based on the way these 
groups interact

e.g.

	for j=0 while j<10 step j=j+1 do
	done

would generate

	movlw	10
	movwf	j

    lab1
	decfsz	j
	goto	lab1


AND FINALLY!!!

I keep talking about protocols as though they are a big thing - well they 
are. If you have a ridged protocol then you can end up with some severe 
limitations. e.g. in C if you use a dynamic stack and you insist on 
passing everything on the stack then there is little that you can do to 
optimise a call. Similarly if all your arithmetic is based on 16 bit ints 
then there is a big penalty and optimisation suffers.

A good compiler will have lots of protocols at its disposal, it will 
analyse the HLL source code and it will be constantly selecting the 
protocol which allows it to generate the best executable code.

A really good assembler programmer CAN do all that but a good compiler 
WILL do all this every time there is the slightest change to the source.


Regards
Sergio
-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist