A short spiel on 1-bit error correction codes

There's an error correction code that separates the value bits from the error correction bits, and the difference between the calculated and actual error correction bits is the position of the bit that's wrong. Very nice, eh?

Error correction codes are a way to represent a set of symbols so that if any 1 bit of the representation is accidentally flipped, you can still tell which symbol it was. For example, you can represent two symbols x and y in 3 bits with the values x=111 and y=000. If you flip any one of the bits of these values, you can still tell which symbol was intended.

If more than 1 bit changes, you can't tell, and you probably get the wrong answer. So it goes; 1-bit error correction codes can only correct 1-bit changes.

If b bits are used to represent the symbols, then each symbol will own 1+b values: the value representing the symbol, and the values differing from it by 1 bit. In the 3-bit example above, y owned 1+3 values: 000, 100, 010, and 001. Representing n symbols in b bits will consume n*(1+b) values.

If there is a 1-bit error correction code of b bits for n symbols, then n*(1+b) <= 2b. An x-bit error correction code requires that n*( (b choose 0) + (b choose 1) + ... + (b choose x) ) <= 2b.

Suppose you want a 1-bit error correction code for 211 symbols. Since (14+1)*211>214 but (15+1)*211=215, the code must have at least 15 bits.

OK. Now what exactly is the code? Can we name the symbols 0..211-1 and confine the error correction to just four bits? Yes. Can we calculate four error correction bits easily? Yes. Can we recover easily when an error occurs? Yes. Here's how.

        1     2  3  4     5  6  7  8  9 10 11
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

In the diagram above, we have our 15 bits. Bits 1,2,4,8 (the powers of 2) are the error correction bits. All the others are symbol bits, and store the name of the symbol (an 11 bit value).
1 1 0 0 0
2 0 1 0 0
3 1 1 0 0
4 0 0 1 0
5 1 0 1 0
6 0 1 1 0
7 1 1 1 0
8 0 0 0 1
9 1 0 0 1
10 0 1 0 1
11 1 1 0 1
12 0 0 1 1
13 1 0 1 1
14 0 1 1 1
15 1 1 1 1

To the right is a table of the values 1..15 and their binary representations. To represent the symbol 10101000101, that is

  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
        1     0  1  0     1  0  0  0  1  0  1

we need to figure out what the correction code is. Simply take the binary representation of each set symbol bit (3,6,9,13,15) and XOR them together:

    1 1 0 0   3
    0 1 1 0   6
    1 0 0 1   9
    1 0 1 1  13
 ^  1 1 1 1  15
 -----------
    0 1 1 1

to get the error code bits (1=0,2=1,4=1,8=1). Then fill them in to complete the code.

  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
  0  1  1  1  0  1  0  1  1  0  0  0  1  0  1

The finished code, 011101011000101, is sent to the remote side, which recomputes the error code bits from the symbol bits, and compares that to the error code bits actually received.

That's the definition of the code.

Now what if some bit gets flipped in transit? Suppose a symbol bit (13) get flipped, so you receive 011101011000001 instead of 011101011000101. The remote side recomputes the error code bits from the set symbol bits it received

    1 1 0 0   3
    0 1 1 0   6
    1 0 0 1   9
 ^  1 1 1 1  15    (13 was received 0, not 1)
 -----------
    1 1 0 0

and XORs that with the error code bits it actually received

    0 1 1 1  received
  ^ 1 1 0 0  calculated
 -----------
    1 0 1 1

and finds it's not 0. In fact, it's the binary representation of 13, the bit that was flipped.

Let's try again, flipping one of the error correction bits instead (4). So the value received is 011001011000101 instead of 011101011000101. The remote side calculates the error correction bits from the set symbol bits received

    1 1 0 0   3
    0 1 1 0   6
    1 0 0 1   9
    1 0 1 1  13
 ^  1 1 1 1  15
 -----------
    0 1 1 1

and compares that to the error correction bits actually received

    1 2 4 8

    0 1 0 1  received
  ^ 0 1 1 1  calculated
 -----------
    0 0 1 0

and it's not zero. No, it's the binary representation of 4, the bit that was corrupted. The difference of the calculated and received error bits is always either 0000 (no bit got corrupted) or the position of the bit that got corrupted.

This example was for 15 bits representing 11-bit symbols. The same thing works for any 2n-1 bits; the powers of 2 are the error correction bits. (Notice that the first 7 bits of the 15-bit code are a error correction code for 4-bit symbols. One error code is always the prefix of another like that.) The highest symbol bits can be lopped off if you don't have that many symbols.

See, isn't that neat?