|There is no reason why you can't reset some of the software during your
|"Error" handler, essentially putting the system into its "fail Safe" mode.
|after all, if you set it up that way, the only way your system is going to
|get to the error handler, is if it jumps into the middle of one of the
|code spaces you left unprogrammed.

It depends why you got into the failed state in the first
place.  Often, the only probable mechanism for reaching
unprogrammed code space is an electrical glitch; such a
glitch may probably be better recovered via reset than a
warm start; better still would be to power cycle the chip,
but that would require external hardware.

|I fail to see the difference between reacting to a jump failure, and
|letting the hardware react to the same jump failure, except perhaps the
|necessity of going through the "Hardware Reset" for what may be a glitch
|that only affects the software one time.

The $10,000,000 question here is what caused the jump table
failure (if that's what killed the system).  If the only way
in which the failure could occur is by the CPU executing code
incorrectly (e.g. if I code:

        ; Table evaluator [starting at address $30]
Xlate:
        clrf    PCLATH
        addwf   PC
        db      1,2,4,8, 3,5,7,9, 2,4,6,8, 4,3,2,1

        ; Later on...
MainLoop:
        movf    PORTB,w
        andlw   $0F
        movwf   LatchB

        ; Do some munging...

        ; Later on
        movf    LatchB,w
        call    Xlate
        movwf   Result

        ; ...
        goto    MainLoop

If no code writes to INDF, and if the only write to LatchB is as
above, there is no way the computed jump near the beginning should
ever fail.  Nonetheless, if the chip gets glitched, it's possible
that the value in LatchB might get corrupted.  Of course, if this
DOES happen there's not much guarantee that anything else will be
as it should be either...


>   If the principle of least astonishment is voided, you're best off to
> trust nothing, run a (at least partial) hardware test, and RESTART
> otherwise, i.e. either do a power-up restart or a Watchdog restart.
> Then, you at least know that your hardware's set correctly, etc. -
> because YOU JUST SET IT CORRECTLY.  (You might think of a state machine
> for your project - occasionally when everything tests OK, save state
> "Checkpoint dump", should you end up in psychotic code space, TRUST
> NOTHING, restart, and load your last checkpoint dump and work forwards
> from there.  At least that way if you crash, you don't have to duplicate
> ALL your work from scratch...  Also, the checkpoint dump can give you an
> idea on what's going on <G>)

I like that idea... just not sure I really want to perform a HARDWARE
reset, every time the software glitches..... It makes sense to drop back
to a checkpoint, especially if you checkpoint after you write to an
external device, so you don't end up sending the same message twice...

>
>   When you work with embedded hardware that controls electronics that
> can quite literally blow up when over-driven, ASSUMING that things are
> safe just isn't a good idea at all.  (Say you were controlling a piece
> of high powered pulsed RF transmitter with a PIC part, the transmitter's
> turned on at super high power, and just before it is to be turned off,
> the software crashes;  You then assume that the transmitter's off and go
> ahead and process the next 15 minutes worth of received data in the PIC,
> setting up for the next transmission, as the transmitter not-so-slowly
> melts into $25,000 worth of slag, but your job's secure and the boss
> will be happy - you assumed it was safe, so it was, right? <G>)  This is
> different than a lamp dimmer or soundmaker where an occasional "oops"
> just makes the light bulb flash a little brighter or the sound a little
> different than expected;  I for one find that the way you GET good
> habits, is to always be very aware of what you're doing, when you're
> -developing- habits <G>

I wouldn't be having this discussion, if I didn't want to get advice on
the best way to achieve my goals... I just don't necessarily want the
quick and easy answers, like doing a hardware reset every time you glitch.

It might take a little longer, to try to resurrect the software from the
checkpoint without doing a complete reset, first, then do the reset if you
end up going through the error routine with the same checkpoint the second
time, but at least, you don't have to take that time, just because you
assumed the hardware was having problems when it was a temporary glitch.

>   (You can bet that any sane person who had that one, would make sure
> that his power-up / watchdog software turned off the transmitter, first
> thing.  THEN went on to other things...)
>
>   Mark
>

Being sane, your error routine, would do the same thing....

In essence, I am wondering why you need the WDT at all, if your code is
designed to be fail-safe....

Maybe there is something I don't know about PIC's, that makes them
different, but when I want to reset my computer, I usually do a WARM BOOT
FIRST rather than pushing the reset, if only because some of the drives
don't reset well from the hardware reset, but they all accept a warm boot.

                                GREY