GRAEME SMITH                         email: grysmith@freenet.edmonton.ab.ca
YMCA Edmonton

Address has changed with little warning!
(I moved across the hall! :) )

Email will remain constant... at least for now.


On Mon, 15 Feb 1999, Mark Willis wrote:

> Graeme Smith wrote:
> > <snipped>
> >         (un-programmed state) with a jump to preknown state, to redirect
> >         errant programs back into the main loop, in a known safe state.
> >
> >                                         GREY
>
>   What I think everyone here's saying, is that they've looked for such a
> beastie, and there isn't such a beast as a "known safe state", once you
> ended up in a unknown state through some unknown means.
>
Well, you seem to be assuming that a WDT gives you a "Known Safe State",
(as long as you do the fail safe restart code).

There is no reason why you can't reset some of the software during your
"Error" handler, essentially putting the system into its "fail Safe" mode.
after all, if you set it up that way, the only way your system is going to
get to the error handler, is if it jumps into the middle of one of the
code spaces you left unprogrammed.

I fail to see the difference between reacting to a jump failure, and
letting the hardware react to the same jump failure, except perhaps the
necessity of going through the "Hardware Reset" for what may be a glitch
that only affects the software one time.

>   If the principle of least astonishment is voided, you're best off to
> trust nothing, run a (at least partial) hardware test, and RESTART
> otherwise, i.e. either do a power-up restart or a Watchdog restart.
> Then, you at least know that your hardware's set correctly, etc. -
> because YOU JUST SET IT CORRECTLY.  (You might think of a state machine
> for your project - occasionally when everything tests OK, save state
> "Checkpoint dump", should you end up in psychotic code space, TRUST
> NOTHING, restart, and load your last checkpoint dump and work forwards
> from there.  At least that way if you crash, you don't have to duplicate
> ALL your work from scratch...  Also, the checkpoint dump can give you an
> idea on what's going on <G>)

I like that idea... just not sure I really want to perform a HARDWARE
reset, every time the software glitches..... It makes sense to drop back
to a checkpoint, especially if you checkpoint after you write to an
external device, so you don't end up sending the same message twice...

>
>   When you work with embedded hardware that controls electronics that
> can quite literally blow up when over-driven, ASSUMING that things are
> safe just isn't a good idea at all.  (Say you were controlling a piece
> of high powered pulsed RF transmitter with a PIC part, the transmitter's
> turned on at super high power, and just before it is to be turned off,
> the software crashes;  You then assume that the transmitter's off and go
> ahead and process the next 15 minutes worth of received data in the PIC,
> setting up for the next transmission, as the transmitter not-so-slowly
> melts into $25,000 worth of slag, but your job's secure and the boss
> will be happy - you assumed it was safe, so it was, right? <G>)  This is
> different than a lamp dimmer or soundmaker where an occasional "oops"
> just makes the light bulb flash a little brighter or the sound a little
> different than expected;  I for one find that the way you GET good
> habits, is to always be very aware of what you're doing, when you're
> -developing- habits <G>

I wouldn't be having this discussion, if I didn't want to get advice on
the best way to achieve my goals... I just don't necessarily want the
quick and easy answers, like doing a hardware reset every time you glitch.

It might take a little longer, to try to resurrect the software from the
checkpoint without doing a complete reset, first, then do the reset if you
end up going through the error routine with the same checkpoint the second
time, but at least, you don't have to take that time, just because you
assumed the hardware was having problems when it was a temporary glitch.

>   (You can bet that any sane person who had that one, would make sure
> that his power-up / watchdog software turned off the transmitter, first
> thing.  THEN went on to other things...)
>
>   Mark
>

Being sane, your error routine, would do the same thing....

In essence, I am wondering why you need the WDT at all, if your code is
designed to be fail-safe....

Maybe there is something I don't know about PIC's, that makes them
different, but when I want to reset my computer, I usually do a WARM BOOT
FIRST rather than pushing the reset, if only because some of the drives
don't reset well from the hardware reset, but they all accept a warm boot.

                                GREY