Byron, Thanks for the long explanation. Currently I'm not working on anything like text de-compression on PICs. But its nevertheless a very interesting topic. I've saved your message. Maybe it'll come in hand some time in the fututre. Thanks, Jeethu Rao > -----Original Message----- > From: pic microcontroller discussion list > [mailto:PICLIST@MITVMA.MIT.EDU]On Behalf Of Byron A Jeff > Sent: Friday, October 12, 2001 4:37 PM > To: PICLIST@MITVMA.MIT.EDU > Subject: Re: [ee]: text compression > > > On Fri, Oct 12, 2001 at 12:46:26PM +0530, Jeethu Rao wrote: > > > Huffman will work fine if you have a large set of small > strings that all > > > similar occurance frequencies of each character code. > > > > > > > Bob, > > This is real interesting. Have you done huffman on PICs ? I managed with > > RLE. > > And RLE is of no help on text strings. But for huffman I > could'nt imagine of > > a way to do. > > Well first off you'd only do huffman decoding on the PIC. The > encoding of the > strings would be done offline. > > Huffman decoding is generally done with a binary tree. It shouldn't be too > difficult to collapse a binary tree representation into a table. > All you'd have > to do is follow some simple rules like leaves of the tree have > bit 7 turned off > and interior pointers would have bit 7 turned on. For each > interior node have > a a consecutive pair of pointers for the left and right subtrees. > Then it's > just a matter of reading the next compressed bit, adding it to > current pointer > and accessing the table. If the top bit is set then proceed to > the next bit. > If not then that's the character. > > Here's a simple example. Three characters and their huffman encodings: > > A - 0 > B - 10 > C - 11 > > Here's the table: > > 0:01000001 ; The character A > 1:10000010 ; Subtree starts at table element 2 > 2:01000010 ; The character B > 3:01000011 ; The character C > > So the bit string 11010 would map out like this: > > - init to table element p=0, the root of the tree. > - Access the first bit, a 1, and add it to p, Access table[p] = 10000010 > - Since the top bit is 1, set p to the lower 7 bits (p=2) and continue. > - Access the next bit, a 1, and add it to p, access table[p] = > 01000011 ('C') > - Since the top bit is 0, We're done. Output a 'C'. Reset p=0 > > - Access the next bit, a 0, and add it to p, access table[p] = > 01000001 ('A') > - Since the top bit is 0, We're done. Output a 'A'. Reset p=0 > > - Access the next bit, a 1, and add it to p, access table[p] = 10000010 > - Since the top bit is 1, set p to the lower 7 bits (p=2) and continue. > - Access the next bit, a 0, and add it to p, access table[p] = > 01000010 ('B') > - Since the top bit is 0, We're done. Output a 'B'. Reset p=0 > > The output would be "CAB". > > Depending on the size of the string you can Huffman encode a null > character at > the end of the string, or preceed the string with its length. > > And of course a quick review of Huffman coding. Do a frequency analysis of > the characters in your strings and build a tree so that the most frequent > characters (statistically) have the shortest path in the tree. So in the > above example across all the strings, 'A' must have occured more > than 'B' or > 'C' because it has a shorter Huffman encoding. > > Hope this helps, > > BAJ > > -- > http://www.piclist.com#nomail Going offline? Don't AutoReply us! > email listserv@mitvma.mit.edu with SET PICList DIGEST in the body > > -- http://www.piclist.com hint: PICList Posts must start with ONE topic: [PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads