>You can try using Huffman encoding. >The idea is to represent the highest occurring characters >with least number of bits. >Highest occurring characters for English text would be R,S,T,L,N etc. The most common character in the English language is " " :-) This is worth noting when you are using Huffman coding or any other code which depends on symbol frequency. "e" features very high also. Look at apps like eg PKZIP which has a number of algorithms depending on structure.of source data. Consider using less bits per character and packing the "words" into bytes. Baudot teletypes used 5 bits (32 symbols) which allows 26 letters plus 2 shift characters (shift to set a, shift to set B) giving a total of 48 working characters. Baudot didn't use upper/lower case AFAIR but there is enough room in 48 chars to do so. This becomes perhaps inefficient IF you have lots of mixed case eg "McMahon" :-) but is good for normal cpitalisation. eg "Hello, my name is Russell." = 25 characters = 26 bytes normally. Using above scheme this become But where > means use second set (capitals and punctuation etc) = ">hr.<" = 31 x 5 bits = 155 bits = 193/8 = 20 bytes Actually, I'm a bit scrambled here as I haven't well thought out the upper case handling. Still, some indication of modest gains possible. Max poss = 37.5% less than raw text. Not good enough. I think that about 2:1 is achievable without really fancy efforts. Try Huffman. Russell McMahon _____________________________ >From other worlds - www.easttimor.com www.sudan.com What can one man* do? Help the hungry at no cost to yourself! at http://www.thehungersite.com/ (* - or woman, child or internet enabled intelligent entity :-))