David Cary wrote:

> I hear that ``spell checkers'' compress their lists of spelling words by storing
> a few bits indicating how many letters this word had in common with the start of
> the previous word.....
> I was playing with a very simple compression algorithm based on the letter
> frequencies in
>   http://www.piclist.com/techref/method/compress/embedded.htm
> and
>   http://www.piclist.com/techref/method/compress/etxtfreq.htm
> (which really needs a "up" link to
>   http://www.piclist.com/techref/method/compress.htm
> )
> that decoded like this:
>
>   100: space
>   101: e
>   110: t
>   111: n
>   0100: r
>   0101: o
>   0110: a
>   0111: i
>   00xxxx_xxxx: all other (8 bit) letters.
>
> I think I could decode this with a pretty compact program on a PIC.
> I think I could even *encode* this with a small subroutine on the PIC.
> It has the feature that I can encode *any* sequence of raw bytes, so I can send
> funky control codes.

This is a clever system and would give easy encoding
and decoding, but will it really compress that much??
Consider if the 8 most common characters you picked
account for 50% of the text, the other 50% needs 10bits,
so 50x3.5 + 50x10 = 675bits==100 chars. So it is
6.75 bits/char, not really that good a compression??
Although it IS excellent for using a very small lookup
table, as most other decompressors require a large
table.

But how would you go for capitalisation, etc...
:o)
-Roman

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads