At 08:19 AM 5/9/2010, Olin Lathrop wrote: >I think watchdogs, particularly internal ones, are way overrated. All they >are going to do is reset the part when a particular kind of software bug >occurs. I think that watchdog timers are a great tool when used appropriately. I normally don't think of them as catching software faults, but rather as one means of getting the processor back on track if corrupted by some external event. I have to preface the following comments with a couple of important observations: I'm not a great programmer. I write code that works well but it often takes me way longer to get to the finished product than it should. I'm more of a hardware guy who got into writing firmware. I generally look at things more from a discrete hardware perspective than a 'real' software person would. That said: here goes. I tend to structure my programs in one of two discrete ways: simple programs are along the lines of a cooperative multi-tasking loop, more complex timing-type programs are more of a linear progression from start to finish. The simple multi-tasking loop is just that: a loop that repeats at a (usually) 1ms rate. It starts at the top and does everything that it needs to do. When it reaches the end of the loop, it calls the background task. The background task kicks the watchdog, then looks after all background stuff - RTCC, a/d, timers, SPI communications, etc. Upon return from the background task, the main loop jumps back to its beginning and the whole thing repeats. I mention that the main loop can function as a simple cooperative multi-tasking system. It does that by implementing each task as a state machine. Each state machine yields its time slot when it has nothing further to do. The idea is that each separate task needs only a few (to a few dozen) cycles - there is lots of room and lots of time for lots of state machines. Interrupts mesh nicely with this concept: each ISR does what it needs to do, then sets a flag to tell the foreground task that it needs to deal with something. Again, individual ISRs tend to quick and short. The longer timing-type programs aren't as well suited to a simple, relatively short loop. Instead, they function more like traditional programs, with a start and an end. A typical example is what I use in our gas-fired catalytic-heater industrial process oven controllers: a start-up cycle of many steps followed by an operate cycle. In those programs, each phase of a cycle is designed to do or wait for one particular thing. In the oven controllers, its waiting for either temperature or time or both. While its waiting, it repeatedly calls the background task. Again, the background task looks after all of the house-keeping: kick the watchdog, deal with RTCC, timers, slow SPI communications (one clock edge per tick), a/d, etc. In my oven controllers, the background task also contains the entire remote communications task as well as calling one or two other self-contained tasks. In this case, even though the main loop is held (stopped) at discrete points within each phase of the various cycles, it gives the illusion of multi-tasking because the background task is being called repeatedly. Interrupts also fit into this concept nicely, In this case, though, its usually the background task that is looking for and dealing with flags set by ISRs. Any projects that I build that have safety issues (such as the gas-fired oven controllers mentioned above) have more than one watchdog. The external watchdog timer I've been using for many years now is the Xicor X5043 system supervisor - it contains a watchdog timer, power-supply supervisor and 512 bytes of eeprom. I still use that eeprom to store configuration information even though modern PICS have built-in eeprom storage - mostly out of inertia, I guess. But the Xicor chips have been bullet-proof - I *never* have issues with corrupted eeprom contents. The other thing that the Xicor chip does is drive an external timer that is used to disable certain outputs if needed. Basically, this timer is driven by the reset line and is a pulse-stretcher with a period about twice or three times the watchdog reset repetition period. It keeps hazardous outputs disabled if the processor can't keep the external watchdog happy as well as when the power supply is below safe operating levels. Anyway, the point I'm trying to make is that I do NOT count on the watchdog(s) to catch software errors. That's not its job! Instead, the watchdog is used both to recover if the system is corrupted by some external event as well as disable hazardous outputs if the system is not working correctly. The other point I should make is that I do NOT use interrupts to call my background task. That would defeat most of the safety aspects of the watchdog - its quite possible that the foreground task is off in lala land but the timer interrupt is still working properly. Instead, the background task must be called by the foreground. Only then is the watchdog timer reset. I'm really interested in hearing other people comment on this subject - its a great learning opportunity. Olin - many thanks for making your post! dwayne -- Dwayne Reid Trinity Electronics Systems Ltd Edmonton, AB, CANADA (780) 489-3199 voice (780) 487-6397 fax www.trinity-electronics.com Custom Electronics Design and Manufacturing -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist