At 08:19 AM 5/9/2010, Olin Lathrop wrote:

>I think watchdogs, particularly internal ones, are way overrated.  All they
>are going to do is reset the part when a particular kind of software bug
>occurs.

I think that watchdog timers are a great tool when used 
appropriately.  I normally don't think of them as catching software 
faults, but rather as one means of getting the processor back on 
track if corrupted by some external event.

I have to preface the following comments with a couple of important 
observations: I'm not a great programmer.  I write code that works 
well but it often takes me way longer to get to the finished product 
than it should.

I'm more of a hardware guy who got into writing firmware.  I 
generally look at things more from a discrete hardware perspective 
than a 'real' software person would.

That said: here goes.

I tend to structure my programs in one of two discrete ways:  simple 
programs are along the lines of a cooperative multi-tasking loop, 
more complex timing-type programs are more of a linear progression 
from start to finish.

The simple multi-tasking loop is just that: a loop that repeats at a 
(usually) 1ms rate.  It starts at the top and does everything that it 
needs to do.  When it reaches the end of the loop, it calls the 
background task.  The background task kicks the watchdog, then looks 
after all background stuff - RTCC, a/d, timers, SPI communications, 
etc.  Upon return from the background task, the main loop jumps back 
to its beginning and the whole thing repeats.

I mention that the main loop can function as a simple cooperative 
multi-tasking system.  It does that by implementing each task as a 
state machine.  Each state machine yields its time slot when it has 
nothing further to do.

The idea is that each separate task needs only a few (to a few dozen) 
cycles - there is lots of room and lots of time for lots of state machines.

Interrupts mesh nicely with this concept: each ISR does what it needs 
to do, then sets a flag to tell the foreground task that it needs to 
deal with something.  Again, individual ISRs tend to quick and short.


The longer timing-type programs aren't as well suited to a simple, 
relatively short loop.  Instead, they function more like traditional 
programs, with a start and an end.  A typical example is what I use 
in our gas-fired catalytic-heater industrial process oven 
controllers: a start-up cycle of many steps followed by an operate cycle.

In those programs, each phase of a cycle is designed to do or wait 
for one particular thing.  In the oven controllers, its waiting for 
either temperature or time or both.  While its waiting, it repeatedly 
calls the background task.

Again, the background task looks after all of the house-keeping: kick 
the watchdog, deal with RTCC, timers, slow SPI communications (one 
clock edge per tick), a/d, etc.  In my oven controllers, the 
background task also contains the entire remote communications task 
as well as calling one or two other self-contained tasks.

In this case, even though the main loop is held (stopped) at discrete 
points within each phase of the various cycles, it gives the illusion 
of multi-tasking because the background task is being called repeatedly.

Interrupts also fit into this concept nicely,  In this case, though, 
its usually the background task that is looking for and dealing with 
flags set by ISRs.


Any projects that I build that have safety issues (such as the 
gas-fired oven controllers mentioned above) have more than one 
watchdog.  The external watchdog timer I've been using for many years 
now is the Xicor X5043 system supervisor - it contains a watchdog 
timer, power-supply supervisor and 512 bytes of eeprom.  I still use 
that eeprom to store configuration information even though modern 
PICS have built-in eeprom storage - mostly out of inertia, I 
guess.  But the Xicor chips have been bullet-proof - I *never* have 
issues with corrupted eeprom contents.

The other thing that the Xicor chip does is drive an external timer 
that is used to disable certain outputs if needed.  Basically, this 
timer is driven by the reset line and is a pulse-stretcher with a 
period about twice or three times the watchdog reset repetition 
period.  It keeps hazardous outputs disabled if the processor can't 
keep the external watchdog happy as well as when the power supply is 
below safe operating levels.


Anyway, the point I'm trying to make is that I do NOT count on the 
watchdog(s) to catch software errors.  That's not its job!  Instead, 
the watchdog is used both to recover if the system is corrupted by 
some external event as well as disable hazardous outputs if the 
system is not working correctly.

The other point I should make is that I do NOT use interrupts to call 
my background task.  That would defeat most of the safety aspects of 
the watchdog - its quite possible that the foreground task is off in 
lala land but the timer interrupt is still working 
properly.  Instead, the background task must be called by the 
foreground.  Only then is the watchdog timer reset.

I'm really interested in hearing other people comment on this subject 
- its a great learning opportunity.  Olin - many thanks for making your post!

dwayne

-- 
Dwayne Reid   <dwayner@planet.eon.net>
Trinity Electronics Systems Ltd    Edmonton, AB, CANADA
(780) 489-3199 voice          (780) 487-6397 fax
www.trinity-electronics.com
Custom Electronics Design and Manufacturing

-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist