Charset Extractor from Images

Introduction

This small utility is designed to extract characters from a bitmap, and convert every character to a string of binary words. You prepare the bitmap, specify a few options, and the program generates the table!.

The word can have an arbitrary size. For SX micros it is more convinient to choose a 12 bit word, and for PICs - 8 bits (or more for those PICs which can read their program memory). The way pixels are packed in words is fully adjustable. It is possible to pack a pixel line (a row or a column of pixels in a character image) in one or more words, or even several pixel lines in one or more words.

The bitmap should be only in PNG (Portable Networks Graphics) format. Sorry, no BMP or GIF support currently. PNG is well suited for the task and was the easiest for me to get started. Hopefully it is not a problem as many graphics editors can read and write PNG files.

Download

If you use Windows, you can download the png2charset executable archived in a .zip file (53 kB).

If you don't, you will need to compile it yourself. The program is rather small, so there shouldn't be a lot of problems with building it. png2charset is written in C++. Source files are packed into a .zip archive, download here (9 kB). Beside that, you will need to have (or compile) the libpng and zlib C libraries. The program can be compiled for example with GCC for Windows (GNU Compiler Collection), which is freely available.

Usage

Synopsis:

png2charset -w width -h height -b wordwidth [-m mapping] [-r hexprefix] in.png [out.asm]

-w width       character extent in pixels
-h height
-b wordwidth   word width to translate the character data to
-m mapping     optional mapping of each pixel line in a character
               to words, like a7a6a5a4a3a2a1a0b7b6b5b4-c7c6c5c4c3c2c1c0b3b2b1b0
-r hexprefix   prefix indicating a hex number ($ by default)
-v             translate characters by vertical lines (default is by horisontal)

Detailed description:

The bitmap in file in.png should contain individual character images arranged in a matrix without blanks between. The program reads the size of image, reads the size of a character (options -w and -h), and determines how much characters the image holds in a row and a column. The topmost left character is assumed to be the space (ASCII code 0x20). All others are counted up from it to the right, then down, and so on.

After the bitmap is read in memory, the program extracts a bitmap for each character and then translates it to a set of binary words. The word size is specified by -b option.

The high flexibility of png2charset consists in the idea of mapping pixels to words (-m option). The mapping string represents to which bit of which word a pixel is to be copied.

First, a character bitmap is divided in lines of pixels. By default, the character is divided in horizontal lines, but if the -v option is specified than it is divided in vertical lines instead. (Horizontal splitting may be useful, for example, for video graphics generation, and the vertical one for wide LED displays). Then each line is written into one or more consecutive words (the table entries) one pixel at a time. The process is controlled by the mapping string, which, in general, specifies mapping from a group of pixel lines to a group of words. In the group, words are counted a, b, c, etc. Bits in each word are counted 0(LSb), 1, 2, ..., 9, A, B, C, etc... (note the upper case). For example, if the string contains a0, it means that the corresponding pixel will be copied to bit 0 of first word, cA means third word, bit 10. Pixel lines are divided by a dash character '-'. Pixels in horizontal lines are scanned from left to right, in vertical - from top to bottom.

011100
100010
100000
101110
100010
100010
011100
000000
For example, given this character, word size 8 bits, and mapping a5a4a3a2a1a0, we'll get in result:

011100,100010,100000,101110,100010,100010,011100,000000

However two bits are unused. There is a way to pack the data more using a different mapping like
a7a6a5a4a3a2-a1a0b7b6b5b4-b3b2b1b0c7c6-c5c4c3c2c1c0.

This way we'll pack four pixel lines in three 8 bit words:

01110010,00101000,00101110,10001010,00100111,00000000

Note that bits and words in the mapping can be in any order. See also more examples.

If the mapping is not specified, it is created automatically. Each pixel line is copied to a minimum number of words, aligned to the right.

After the bitmap is translated to binary words, they are printed in a table to file out.asm if it is specified, or to standard output if not. Entries are written after directive DW (find&replace it with a different one if necessary, it depends on the selected word size and particular assembler requirements). Words are printed in hexadecimal format, and the prefix can be selected by -r option (default prefix is $).

Examples

I have generated a number of charset tables for a few resolutions as an example:

Tip: It is really easy to use the DOS box in Windows as a font image generator! Just display the whole character set in it, select the required font size, capture the image with a screen capture utility, edit it in a graphics editor (clip everything but the character set, and make the characters light on a dark background). It is not necessary to achieve exactly black and white colors, png2charset will automatically convert the image to black and white. Here is a text file with character codes 32 to 255 (ASCII), which can be used for that purpose.

4x6

Batch file

SX 12 bit table (-m aBaAa9a8-a7a6a5a4-a3a2a1a0)

PIC 8 bit table (-m a7a6a5a4-a3a2a1a0)

PIC 14 bit table (-m aBaAa9a8-a7a6a5a4-a3a2a1a0)

5x12

Batch file

SX 12 bit table (-m aBaAa9a8a7-a4a3a2a1a0)

PIC 8 bit table (-m a6a5a4a3a2-a1a0b7b6b5-b4b3b2b1b0)

PIC 14 bit table (-m aCaBaAa9a8-a4a3a2a1a0)

6x8

Batch file

SX 12 bit table (-m aBaAa9a8a7a5-a5a4a3a2a1a0)

PIC 8 bit table (-m a7a6a5a4a3a2-a1a0b7b6b5b4-b3b2b1b0c7c6-c5c4c3c2c1c0)

PIC 14 bit table (-m aDaCaBaAa9a8-a5a4a3a2a1a0)

7x12

Batch file

SX 12 bit table (-m a6a5a4a3a2a1a0-aAa9a8bBbAb9b8-b6b5b4b3b2b1b0)

PIC 8 bit table

PIC 14 bit table (-m aDaCaBaAa9a8a7-a6a5a4a3a2a1a0)

8x12

Batch file

SX 12 bit table (-m a7a6a5a4a3a2a1a0-aBaAa9a8bBbAb9b8-b7b6b5b4b3b2b1b0)

PIC 8 bit table

PIC 14 bit table (-m a7a6a5a4a3a2a1a0-aBaAa9a8bBbAb9b8-b7b6b5b4b3b2b1b0)

10x18

Batch file

SX 12 bit table

PIC 8 bit table (-m b5b4a7a6a5a4a3a2a1a0-b1b0c7c6c5c4c3c2c1c0)

PIC 14 bit table

12x16

Batch file

SX 12 bit table

PIC 8 bit table (-m a7a6a5a4a3a2a1a0b7b6b5b4-b3b2b1b0c7c6c5c4c3c2c1c0)

PIC 14 bit table

News

Feb 12, 2001 Added vertical splitting option. Enhanced error detection in mapping string.
Feb 9, 2001 Initial release

©2001 Nikolai Golovchenko

Philip Adam Pemberton Says:

There's a nasty bug in CharsetImage.cpp on line 336 - at the end of CharsetImage::createCharSet(), this line needs to be inserted before the final closing brace:
return this;

Otherwise compilation will succeed, but png2charset will fail to run, returning an "Error while initializing" error. There are also a few other errors in the code, though these don't appear to be significant enough to cause any major trouble (just a few compiler warnings).

Comments: