Roman Black <fastvid@EZY.NET.AU> on 2001-04-10 02:56:18 AM
had some really good points and presents some good ideas.

Most data compression and decompression algorithms don't use any multiplies or
divides, so it's no big deal that PICs have slow multiplies.

The *fastest* data compression routine I've seen is RLE (which is useless on
English text).

The next fastest data compression routine I've seen is:
  LZRW1A
  http://www.ross.net/compression/lzrw1a.html
(compression source code; also decompression source code is here).
but this needs a block of "history" RAM when decompressing. The bigger this
block, the better the compression. Does the PIC have enough RAM for "adequate"
compression ?

Some other ``small'' data compression ideas were discussed on Deja (now Google
Groups)
under subject headings ``compression on a PDA'' and ``Compressing the Bible for
a PDA''.

http://groups.google.com/groups?q=%22Compressing+the+Bible+for+a+PDA%22&hl=en&lr=&safe=off&btnG=Google+Search&meta=site%3Dgroups%26group%3Dcomp.compression.*
http://groups.google.com/groups?q=%22compression+on+a+PDA%22&hl=en&lr=&safe=off&btnG=Google+Search&meta=site%3Dgroups%26group%3Dcomp.compression.*

(be sure to view the whole thread).

I hear that ``spell checkers'' compress their lists of spelling words by storing
a few bits indicating how many letters this word had in common with the start of
the previous word (zero, for the first word (starting with A), the first word
starting with B, etc), a few bits indicating how much to increment the next
letter (unless the previous word didn't have a next letter), followed by the
rest of the word.
For example,
  ex
  fear
  feared
  fearing
  fed
  feed
(26 bytes, not counting end-of-word indicator)
would be encoded
  <0> e x
  <0> <1> e a r
  <4> e d
  <4> <4> ng
  <2> <3>
  <2> <1> d
(If you just put the letters into bytes, and the numbers into nybbles, rather
than something more efficient, this gives 15 bytes, not counting end-of-word
indicator).

I was playing with a very simple compression algorithm based on the letter
frequencies in
  http://www.piclist.com/techref/method/compress/embedded.htm
and
  http://www.piclist.com/techref/method/compress/etxtfreq.htm
(which really needs a "up" link to
  http://www.piclist.com/techref/method/compress.htm
)
that decoded like this:

  100: space
  101: e
  110: t
  111: n
  0100: r
  0101: o
  0110: a
  0111: i
  00xxxx_xxxx: all other (8 bit) letters.

I think I could decode this with a pretty compact program on a PIC.
I think I could even *encode* this with a small subroutine on the PIC.
It has the feature that I can encode *any* sequence of raw bytes, so I can send
funky control codes.
If I didn't need all possible 8 bit values
I could combine this with Roman Black's idea
and further decode some of those ``unused'' 10 bit symbols
(obviously I don't need the ones for ``_etnroai'')
into letter pairs, triplets, or entire words.

True Morse code compresses almost as well as the above scheme -- if I use "1"
for dot, "01" for dash, "00" for end-of-letter, I get
00 : space
100 : e
0100 : t
01100 : n
101100 : r
01010100 : o
10100 : a
1100 : i
11100 : s
...
for the most common letters.

--
David Cary

--
http://www.piclist.com hint: To leave the PICList
mailto:piclist-unsubscribe-request@mitvma.mit.edu