On Fri, Oct 12, 2001 at 12:46:26PM +0530, Jeethu Rao wrote: > > Huffman will work fine if you have a large set of small strings that all > > similar occurance frequencies of each character code. > > > > Bob, > This is real interesting. Have you done huffman on PICs ? I managed with > RLE. > And RLE is of no help on text strings. But for huffman I could'nt imagine of > a way to do. Well first off you'd only do huffman decoding on the PIC. The encoding of the strings would be done offline. Huffman decoding is generally done with a binary tree. It shouldn't be too difficult to collapse a binary tree representation into a table. All you'd have to do is follow some simple rules like leaves of the tree have bit 7 turned off and interior pointers would have bit 7 turned on. For each interior node have a a consecutive pair of pointers for the left and right subtrees. Then it's just a matter of reading the next compressed bit, adding it to current pointer and accessing the table. If the top bit is set then proceed to the next bit. If not then that's the character. Here's a simple example. Three characters and their huffman encodings: A - 0 B - 10 C - 11 Here's the table: 0:01000001 ; The character A 1:10000010 ; Subtree starts at table element 2 2:01000010 ; The character B 3:01000011 ; The character C So the bit string 11010 would map out like this: - init to table element p=0, the root of the tree. - Access the first bit, a 1, and add it to p, Access table[p] = 10000010 - Since the top bit is 1, set p to the lower 7 bits (p=2) and continue. - Access the next bit, a 1, and add it to p, access table[p] = 01000011 ('C') - Since the top bit is 0, We're done. Output a 'C'. Reset p=0 - Access the next bit, a 0, and add it to p, access table[p] = 01000001 ('A') - Since the top bit is 0, We're done. Output a 'A'. Reset p=0 - Access the next bit, a 1, and add it to p, access table[p] = 10000010 - Since the top bit is 1, set p to the lower 7 bits (p=2) and continue. - Access the next bit, a 0, and add it to p, access table[p] = 01000010 ('B') - Since the top bit is 0, We're done. Output a 'B'. Reset p=0 The output would be "CAB". Depending on the size of the string you can Huffman encode a null character at the end of the string, or preceed the string with its length. And of course a quick review of Huffman coding. Do a frequency analysis of the characters in your strings and build a tree so that the most frequent characters (statistically) have the shortest path in the tree. So in the above example across all the strings, 'A' must have occured more than 'B' or 'C' because it has a shorter Huffman encoding. Hope this helps, BAJ -- http://www.piclist.com#nomail Going offline? Don't AutoReply us! email listserv@mitvma.mit.edu with SET PICList DIGEST in the body