UPC / EAN-13 Barcode

UPC codes are perhaps the most commonly-encountered kind of bar code.

See also:

Encoding

Javascript code by Birger Nielsen to produce an EAN-13 barcode using 2 images (one a white pixel, the other a black pixel)
Rescued from http://www.246.dk/ean-13.html which has been taken down.

<script type="text/javascript">
<!--
{
  alfabet = new Array(
   'AAAAAACCCCCC',
   'AABABBCCCCCC',
   'AABBABCCCCCC',
   'AABBBACCCCCC',
   'ABAABBCCCCCC',
   'ABBAABCCCCCC',
   'ABBBAACCCCCC',
   'ABABABCCCCCC',
   'ABABBACCCCCC',
   'ABBABACCCCCC'
  );
  acode = new Array(  
   '0001101','0011001','0010011','0111101','0100011',
   '0110001','0101111','0111011','0110111','0001011'
  )
  bcode = new Array(  
   '0100111','0110011','0011011','0100001','0011101',
   '0111001','0000101','0010001','0001001','0010111'
  )
  ccode = new Array(  
   '1110010','1100110','1101100','1000010','1011100',
   '1001110','1010000','1000100','1001000','1110100'
  )
  value = new Array(
   '0','1','2','3','4','5','6','7','8','9'
  )

  ean=prompt("Enter 12 digits","");
  eanok = (ean != null); 
  if (eanok) eanok = (ean != "");
  if (eanok) {
    chksum = 0;
    code = ean;
    for (i = 0; i < ean.length; i++) {
      v = -1;
      for (j = 0; j < value.length; j++) {
        if (value[j] == ean.charAt(i)) {
          if (i % 2 == 0) {
            v=j;
          } else {
            v=3*j;
          }
        }
      }
      chksum += v;
    }
    chksum = chksum % 10;
    chksum = (10 - chksum) % 10;
    ean = ean + value[chksum];
    for (i=0; i<value.length; i++) {
      if (value[i] == ean.charAt(0)) {
        alfstr = alfabet[i];
      }
    }
    wstr = "101";
    for (i = 0; i < 6; i++) {
      if (alfstr.charAt(i) == "A") {
        wstr += acode[ean.charAt(i+1)];
      }
      if (alfstr.charAt(i) == "B") {
        wstr += bcode[ean.charAt(i+1)];
      }
    }
    wstr += "01010";
    for (i = 6; i < 12; i++) {
        wstr += ccode[ean.charAt(i+1)];
    }
    wstr += "101";

    astr = '<html>' + "\n" + '<head>' + "\n" + 
           '<title>' + ean  + '<'+'/title>' + "\n" + 
           '<'+'/head>' + "\n";
    astr +='<body>' + "\n";
    astr +='<table border=0 cellspacing=0 cellpadding=0>' + "\n" +
           '<tr>';
    for (i = 0; i < wstr.length; i++) {
      astr += '<td>' + '<img src="http://www.246.dk/Pbc-' + 
              wstr.charAt(i) + '.png" alt="' +
              wstr.charAt(i) + '">' + '<'+'/td>' + "\n";
    }
    astr +='<'+'/tr>' + "\n";
    astr +='<'+'/table>' + "\n";
    astr +='<p>' + ean + '<'+'/p>' + "\n";
    astr +='<'+'/body>' + "\n" + '<'+'/html>' + "\n";
    aPopUp= window.open('','code', 'toolbar=yes,menubar=yes,width=500,height=60');
    ndoc= aPopUp.document;
    ndoc.write(astr);
    ndoc.close();
  }
}
// -->
</script>

Decoding

Rescued from David VanHorns website: http://www.dvanhorn.org/Barcode/

To decode a barcode, you must understand it's structure.

Structure

An excellent reference is from agilent (nee HP) http://www.semiconductor.agilent.com/barcode/sg/Misc/upc.html

I won't attempt to reproduce all their information here, but I will cover a few points that I think need stressing, and present some things in different ways that I think make the job easier.

The UPC/EAN codes are an excellent place to start, because they are very well thought out. Other codes have various problems that can make it possible to get a "good" but wrong read. As far as I know, this is impossible in UPC/EAN, provided you follow all the rules.

If you construct a decoder for EAN-13, then you will also decode UPC. While this makes it seem that UPC is a subset of EAN, in fact EAN is an extension of UPC.

The first thing to look at is the structure of a basic UPC barcode.

Each barcode consists of a number of black and white "elements" of some width. In UPC/EAN, each element is either "1", "2", "3", or "4" units wide. The unit of width is called a "module". Each digit in UPC/EAN consists of four elements, with a total width of seven modules. Never six, never eight, never 7.1.

The digits are encoded in two groups of six digits. In the EAN code, a thriteenth digit is encoded in the parity bits of the digits in the left half.

Around these digit groups are reference bars, each one module wide. These reference bars are at both ends of the code, and in the middle. (These are the bars that some people insist are the "secret" encoding of "666", but as you will see, they cannot possibly be interpreted this way)

This next bit will seem complicated, but let that slide, it gets easier later.

For each digit, zero through nine, there are three ways to make the digit. Two methods in the left half, and a third in the right half.

All these are read from left to right, though the pen may move in either direction!

Here's the light and dark patterns for the digits, covering both UPC and EAN.

 Digit Set A Odd Set B Even Set C Even
 0 0001101 0100111 1110010
1 0011001 0110011 1100110
2 0010011 0011011 1101100
3 0111101 0100001 1000010
4 0100011 0011101 1011100
5 0110001 0111001 1001110
6 0101111 0000101 1010000
7 0111011 0010001 1000100
8 0110111 0001001 1001000
9 0001011 0010111 1110100

This probably looks pretty confusing at first glance.

Things to notice:

UPC uses only charset A and C, EAN uses all three.

UPC uses A in the left half and C in the right half.

EAN uses A and B in the left half, and C in the right half.

EAN ALWAYS begins with an A char on the left half in the forward direction, and a C char on the right half in the reverse direction.

From inspection, it is obvious that the A and C charsets are bit inverses of each other. It is impossible in a fixed direction to discriminate between A and C based on the width information alone. Therefore sets A and C are identical, and one set may be discarded.

It is also obvious that C and B are bit reverses of each other. While B and C are never used together, it is easy to tell them apart in a fixed direction read. Therefore, either set B or C is essential to extracting the encoded information

Charsets A and B are bit reversed and inversed. A single operation (Reverse or Inverse) will not allow either to be discarded.

Therefore, the C charset can be discarded, and only the A and B charsets used for decoding both UPC and EAN codes.

In a given read, the first char MUST be either an A set char, indicating a forward read, or a C set char in a reverse read. In the reversed direction, a C set char is identical to a B set char. Therefore, the indication of a B set char as the first char of a read sets the direction to reversed.

This is more obvious when you convert the bit patterns into widths, as they would be presented from an ideal light pen.

 Digit Left Odd (ean only)  Left Even Right
 0 3 2 1 1 1 1 2 3 3 2 1 1
1 2 2 2 1 1 2 2 2 2 2 2 1
2 2 1 2 2 2 2 1 2 2 1 2 2
3 1 4 1 1 1 1 4 1 1 4 1 1
4 1 1 3 2 2 3 1 1 1 1 3 2
5 1 2 3 1 1 3 2 1 1 2 3 1
6 1 1 1 4 4 1 1 1 1 1 1 4
7 1 3 1 2 2 1 3 1 1 3 2 1
8 1 2 1 3 3 1 2 1 1 2 1 3
9 3 1 1 2 2 1 1 3 3 1 1 2

Notice that now I pay no attention to the state of the elements, (wether they are black or white) I literally do not care.

What you are looking at is just the widths (in modules) of the elements.

For each digit, you get a group of four timing elements that total seven modules, and must fit one of these patterns.

The last digit in the UPC/EAN code is a checksum, which insures the integrity of the data. In my experience, this is rarely needed if the lower level decoding is working properly, but I would certainly urge you to implement this checksum test, as I do in all the versions I have produced.

In addition to the main code, there may be either two or five digit supplemental codes. Decoding these is pretty straightforward, but there is one land-mine you should watch out for. Reading five digit supplementals in reverse can cause bad reads. You aren't really supposed to read these backward. In HP's barcode chips, this isn't even an option. Unfortunately, using a light pen, you don't have any way to force the user to read in a particular direction. The reason for the problem is that there are no leading guard bars on the right side of the supplementals, and there is no checksum to protect them. If the light pen has not adequately set up it's data slicer, then the last element or two (rightmost) may be distorted enough to read as a different digit.

In one particular test I kept track of all errors, and ran standard EAN, EAN with two digit, and EAN with five digit supplementals, in forward and reverse. The system read on the first read every time, and every error was due to reading five digit supplementals in reverse. Even so, the misread rate on the reversed five digit supplementals was only about 3%. Unfortunately, there is no mechanism in the UPC/EAN code to prevent these misreads. A blemish on an otherwise sterling system, IMHO.

Pointers

UPC/EAN is a very well constructed code. There are no ways that I know of, to get false reads if you follow the decoding rules.

Structure:

The basic UPC/EAN barcode is formed from three sets of framing elements, and two sets of data elements.

FFF DDDD DDDD DDDD DDDD DDDD DDDD FFFFF DDDD DDDD DDDD DDDD DDDD DDDD FFF

The outer element of each set of framing elements is a bar, so the two outer framing elements are Bar, Space, Bar. The widths of these framing bars are all equal, so they are a good indicator of how much distortion is present in the printing process, and the pen decoding. One method to calibrate this is to add the two bars, and divide that by two, then subtract the space, take half the remainder, and apply this to all samples as they come in.

Example: 22,20,22 So, 22+22 is 44, 44/2 is 22, 22-20 is 2, 2/2 is 1, so in this example, all bars should be decreased by one sample, and all spaces should be increased by one sample.

In the post process approach, you might do this on both ends, and the middle samples (after you locate them!) In the "On-The-Fly" approach, you would use whatever set you hit first, and then recalibrate in the middle.

Cleanup

So now that you have the timing data, you can read the codes, right?

Well, yes, as long as they were printed very well, and read very accurately. Unfortunately, this is often not the case, so you need to do a little cleanup work before you decode.

First, take the four samples, and add them together.

We'll look at a "5" digit, encoded as 1 2 3 1

10,20,30,10 would be an ideal representation, because all the elements read as exactly even multiples of the single module value, and they have pretty good granularity (10 counts per module). 20,40,60,20 would be even better, because of the smaller granulation, but we'll stick with the smaller values as an example.

Note that at this point, we are already looking at data that has been corrected slightly, by the routine in the pen interrupt that compensates for the bar widths.

A more likely real set of data might look like this: 12,19,27,12

While it is still obvious to the human eye, it's no longer so obvious to the processor. Let's fix it.

First, sum the elements: 12+19+27+12= 70 and divide by seven, to get the standard module for this data. (10)

Now looking at each element, we want to "fix" each element to the nearest module width, but we must be very careful how this is done! Remember, the outer edges of the group are the only things you are sure of.

The trick here is to apply the correction from the outside in. You have three "edges" that you need to move.

Without getting into specific code, round the modules, and take the remainder, and add it to the next inward sample, in this specific order, from outer to inner.

Fixing the left half, 12,19,27,12 becomes 10,21,27,12

Fixing the right half, 10,21,27,12 becomes 10,21,29,10

Fixing the center, 10,21,29,10 becomes 10,20,30,10

Cute eh? This corrects for many errors in printing and scanning. The important point is that you have to conserve the samples. Rounding all the samples by themselves will result in disaster, if you don't believe me, then try it on some real data. Be sure to try "6"s as they have only "1" module and "4" module widths.

Also, set up some data, distorted by nearly (but not equal to) 50%, and watch it settle back into correct values. Magic!

For real laughs, try the same experiment, but apply the correction from left to right and see what a mess it makes.

Comments: