Russell McMahon wrote: > > Doesn't get much more OT than this but I'm stuck so I'm "casting my net > fairly wide".. > > I need a program to count the occurrence of each separate word in a > document. > eg how many of each of "why" , "what", "elephant", "aardvark" etc (NOT just > the total number of words). Hi Russell, this is a very specialty task. It is not that hard to do in C, but you are going to need large buffers. You can use a circular counter like the lempel-ziv algorithm (ZIP). But you really need to specify a few details: * Total size of document (characters)? * Total number of words you need logged? * Do you need EVERY word logged? If the total number of logged words is reasonable, the job is very different. Note even in well written C on a fast Pentium this is going to take a LONG time if you want every word logged, and with maybe 10,000 to 20,000 words in a language at maybe 32 bytes each for count and string you have memory issues even in C on a big computer. Have you looked at Solway's "bigtext"?? This is a shareware program that may compress your large document to about 20% to 25% of its size, but I suspect you need more... -Roman -- http://www.piclist.com hint: The list server can filter out subtopics (like ads or off topics) for you. See http://www.piclist.com/#topics