Avoiding corruption in EEPROM Memory

David VanHorn says:

I'm going to center this around the Atmel AVR8515, primarily because I just wrote a test routine for it (linked, see below) and because it's really pretty typical of most microcontrollers.

By EEPROM, I mean memory within the micro, that is held without power, indefinitely, and is re-writable under program control. Much of this is also applicable to external EEPROM chips, but there are several versions of terminology out there.
There are a few things that you must do, when using EEPROM memory in a micro.

First, you must assure that power will remain in spec, the whole time that the micro is writing.
This should be obvious, but for some reason, it seems that some people expect the write to happen, even if the power fails. Unfortunately, this is not going to happen. The micro uses an internal charge pump to develop the programming voltage, so any variation in VCC is doubled or tripled at the charge pump output.
In order to make sure that you always have power during write, you have to implement something in hardware that will signal you when power begins to fail, before any write begins, and of course you must check that input before starting any write.
What happens if you are in the middle of writing a string, and the power is now failing? Well.. That's your problem! You are the system engineer. I'd suggest a flag that indicates that the data is valid, which of course you clear first, and set last. This way, an interrupted write is clearly junk data, once the power comes back on.
Second, you have to assure that RESET won't happen. This varies, on some micros they will complete the write if it was already started before the RESET, External hardware again, and an output pin to tell reset when it must not occur. Of course it would be bad if your micro got hung in a loop, asserting 'NO_RESET' constantly, so I'd add an external R-C limiter, so that the micro can only hold off reset for some short but reasonable time, like maybe twice the specified maximum length of an EEPROM write cycle.

Third, you have to make sure that YOU don't blow the data. A bad jump, like into the EEPROM_WRITE routine, could ruin your whole day. What I do here, is that I always leave the EEPROM data pointer pointing to an EEPROM cell that I don't use. Any accidental writes then are harmless, as long as they don't also manage to corrupt the EEPROM data pointer.
Nothing's perfect.
Fourth, you have to respect the write cycle timing internally. Most micros have a flag mechanism that you must check, to be sure that there isn't a write in progress, before you begin the next write, or perform a read.

There are other possible causes of corrupted data. I have caused an Atmel AVR to corrupt the EEPROM by 'glitching' the xtal with a screwdriver. I can't really say exactly how this affected the system, and I've not been able to repeat it since. This is hardly something that you would expect to happen in normal use though.
Mythology: I have heard reports of EEPROM corruption in Atmel AVR processors. My own testing does not reveal a problem, but I don't have a large number of systems here to test on.
I did generate a special test routine for this. It first checks the EEPROM for some signature bytes. If they are not present, then it 'paves' the EEPROM with test data. Once that is done, it loops, checking EEPROM, and showing the current test address on a display.
At this point, it is safe to power down the device, and on the return of power, the code will check the entire EEPROM array, and then continue looping, checking the EEPROM.
If it detects a bad cell, it notes this on the display, then re-paves the EEPROM.

I did not implement the lockouts described above, because this is only intended for internal testing, and I may want to deliberately induce the sort of faults that the lockouts would prevent.

The code is available for download here
Please let me know what you find.

See also:

Andries Tip refers to
/Techref/member/AT-planet-T9/safeeeprom.htm In one of my applications I could not be sure whether or not the power would stay one during the eeprom writes but I still needed to update and remember some more or less important stuff.

Here a piece of the code I used. It involves a majority check: the value is written to five successive locations. On a read operation, the resulting bits come from their corresponding locations in each of the five bytes. The resulting bit has the value that is most abundant: three or more bits of the same value.

Please have a look:
http://www.dvanhorn.org/Micros/All/Eeprom.php

From SStef Mientki at http://oase.uci.kun.nl/~mientki/pic/libs_hw/eeprom_problems.html

In december 2004 an interesting discussion took place about an often overseen spec of the EEprom.
I too have done a project that has probems probably due to this phenomene. I even have written a library to increase the EEprom endurance, by spreading the information over the complete EEprom, but which offences against this rule and make things probably worse.
What's the problem ?
Normal endurance (parameter D124) is worst case 10^6 (upto 85 Celcius).
But writing to some place in EEprom, reduces the endurance of the other EEprom locations by
- a factor 10, (parameter D120) upto 85 Celcius
- a factor 100, (parameter D120A) above 85 Celcius
So my conclusion is that I've to write a new library, not spreading the information over the EEprom, but keeping the number of writes into a counter and create a refresh of the total EEprom when counter exceeds a certain value.
from m'chip techsupport
The data sheet is being updated in this area.
The refresh works as follows:

any individual EEPROM cell has an absolute maximum endurance.
If you extend the endurance of a piece of data by spreading the value across a few cells you can increase the data's endurance by n * endurance, where n is the number of cells you spread the value across.
If you are updating the entire EEPROM array but you have a few cells that never get touched, the data can be disturbed if the rest of the array has had 10Million erase cycles performed anywhere in the array.

Refresh means you read and re-write the little used cells before the entire array sees 10Million writes.
The ultimate endurance of the DATA EEPROM memory is (size * cell Endurance). However if you never touch one address you can expect that one address to be corrupted before you reach the ultimate endurance. That is why we recomend periodic refresh of less frequently used memory locations.
from Bob Ammerman:
Basically the issue is this:
Any write to EEPROM memory (any location) degrades the signal stored in _all_ EEPROM locations. Thus, the value of any given location can be corrupted, if there are too many writes to _other_ locations between writes to that location.
The datasheet will define parameters for this. For many, but not all PICs, these parameters are called D120 and D120A.
For an example, looking at the datasheet for the PIC18F1220/PIC18F1320 we find (marked as advance information in the copy I have):
Parameter D120 - Byte Endurance - Min=100K, Typ=1M -- This is the number of times any single byte of EEPROM can be written to reliably.
Parameter D124 - Number of total ERASE/WRITE cycles before refresh - Min = 1M, Typ = 10M -- Any bytes that have _not_ been written need to be refreshed before D124 total writes to other bytes are done.
So, what does this really mean?
Well, if you do less than 1M total writes to the EEPROM (and 100K to any one byte) in the life of your product, you have no problem.
If you are using some sort of logging scheme where every EEPROM cell you are using is written to 'regularly', you have no problem.
The typical case where you do have a problem is where some EEPROM locations are quite static and seldom written, while others are actively written. A primary example of this is where configuration or calibration data is stored in part of the EEPROM and logged data in the rest. In this case you would have to perform a periodic refresh of the configuration/calibration data (before doing 1M writes to the logging data area).
A refresh, of course, is as simple as reading and then immediately rewriting the same memory location with the same value.
One useful way to avoid the refresh issue is to store your configuration/calibration data in program memory on those PICs that support self writing. This will work if you can live within the endurance constraint of program memory (parameter D130 for our example PICS, Min=10K, Typ=100K), and can afford to have your PIC 'freeze' while writing the parameters.

Comments: