David Cary wrote: > I hear that ``spell checkers'' compress their lists of spelling words by storing > a few bits indicating how many letters this word had in common with the start of > the previous word..... > I was playing with a very simple compression algorithm based on the letter > frequencies in > http://www.piclist.com/techref/method/compress/embedded.htm > and > http://www.piclist.com/techref/method/compress/etxtfreq.htm > (which really needs a "up" link to > http://www.piclist.com/techref/method/compress.htm > ) > that decoded like this: > > 100: space > 101: e > 110: t > 111: n > 0100: r > 0101: o > 0110: a > 0111: i > 00xxxx_xxxx: all other (8 bit) letters. > > I think I could decode this with a pretty compact program on a PIC. > I think I could even *encode* this with a small subroutine on the PIC. > It has the feature that I can encode *any* sequence of raw bytes, so I can send > funky control codes. This is a clever system and would give easy encoding and decoding, but will it really compress that much?? Consider if the 8 most common characters you picked account for 50% of the text, the other 50% needs 10bits, so 50x3.5 + 50x10 = 675bits==100 chars. So it is 6.75 bits/char, not really that good a compression?? Although it IS excellent for using a very small lookup table, as most other decompressors require a large table. But how would you go for capitalisation, etc... :o) -Roman -- http://www.piclist.com hint: PICList Posts must start with ONE topic: [PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads