=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Date: Tue, 25 Jan 2000 09:30:13 From: Nikolai Golovchenko To: pic microcontroller discussion list Subject: Re: EEPROM endurance/error correction -------------------------------------------------------------------------------- On Monday, January 24, 2000 Roland Andrag wrote: > Hello everyone! > A thread fairly similar to the one I'm about to (hopefully) start came up a > few weeks ago, but I want to pose the same question in a more open way. So > here goes: > What is the best way to check for/detect failure of an EEPROM location when > the endurance limit is reached, and then move on to another location? My > chain of thought leads me to something like: > 1. Have a pointer (in EEPROM) pointing to where the variable of interest is > repeatedly stored; Bad idea. If you assume EEPROM failure, then no pointer can be stored in there. You have to know that the area in EEPROM that you read has no errors. Simple CRC check can help. > 2. Store the variable a couple of times in successive locations (say three > times); AFAIK, EEPROM fails because of writes. So chances are that all these succesive locations will reach their write limit at about the same time. > 3. When reading, if all the stored values (all three in line 2) do not > agree, use the value given by the majority; > 4. Once all three positions do not agree, move on to three new locations and > update the pointer. > At first I considered not mentioning this chain of thought so as not to > influence anyone elses ideas, but did so since I would like comments on it. > So if you have a different/better idea, please mention it! > Thanks, > Roland So I think that you have to break the EEPROM into several banks with only one bank used at a time until it's no more usable because of EEPROM write failure. Then another bank is selected and so on. When all banks are bad (or better earlier) then PIC has to signal for maintainance. This is similar to what Dwaine Reed described a while ago. The current bank selection is made on power-up and the bank number is stored in RAM. Each bank should have two parts (like two FATs). Each part will have a simple CRC (XORing all data bytes will do). Having two parts protects from power-down during writes. This scheme consists of the write stage and power-up initialization. 1)WRITE STAGE. There are two problems that have to be dealt with: (a)write error due to EEPROM failure and (b)power-down during write. As manual says, "write error" can be detected by EECON1 flag. The flag gets set when write is interrupted by reset (MCLR of Watchdog). As I understand, this flag is useful for dealing with brownout, and watchdog can signal that the write operation is too long (very vague, because watchdog frequency depends heavily on temperature, device). So WRERR gives no useful information in (a) and (b) cases. Instead the written data should be verified. If the data don't match then there is error and the bank should be cancelled. Both CRC codes for the bank must not match the corresponding parts to decide on power-up that the bank is bad. (a) Steps for write: 1) Write needed data into the first part of current bank. 2) Verify each byte. If no match then cancel the bank, then copy other (still good) part into new bank and switch to this bank, repeat from 1. 3) Compute and write CRC to the part. 4) Copy the first part into second one. (b) If power-down occurs any time at (a) then one part's CRC will always be right and another is wrong. 2)POWER-UP INITIALIZATION. The purpose of this routine is to find the current bank and restore it if there was power-down during write. Bad bank will have bad CRC codes for both its parts. Good bank will have at least one good CRC. If only one CRC is good for the bank then copy this part into other to restore the bank. I haven't had a chance yet to implement all this, since for my applications ii was safe to assume that the EEPROM doesn't failure for the device life time. Bye, Nikolai