how about PERL???? it was designed to do very tasks like this and it is available for M$, Linux, Mac and any OS you can imagine! andy ----- Original Message ----- From: "Roman Black" To: Sent: Sunday, July 22, 2001 9:44 AM Subject: Re: [OT]: Word counter program wanted > Russell McMahon wrote: > > > > Doesn't get much more OT than this but I'm stuck so I'm "casting my net > > fairly wide".. > > > > I need a program to count the occurrence of each separate word in a > > document. > > eg how many of each of "why" , "what", "elephant", "aardvark" etc (NOT just > > the total number of words). > > > Hi Russell, this is a very specialty task. It is > not that hard to do in C, but you are going to need > large buffers. You can use a circular counter like > the lempel-ziv algorithm (ZIP). > > But you really need to specify a few details: > * Total size of document (characters)? > * Total number of words you need logged? > * Do you need EVERY word logged? > > If the total number of logged words is reasonable, > the job is very different. Note even in well written > C on a fast Pentium this is going to take a LONG > time if you want every word logged, and with maybe > 10,000 to 20,000 words in a language at maybe 32 > bytes each for count and string you have memory > issues even in C on a big computer. > > Have you looked at Solway's "bigtext"?? This is > a shareware program that may compress your large > document to about 20% to 25% of its size, but I > suspect you need more... > -Roman > > -- > http://www.piclist.com hint: The list server can filter out subtopics > (like ads or off topics) for you. See http://www.piclist.com/#topics > > -- http://www.piclist.com hint: The PICList is archived three different ways. See http://www.piclist.com/#archives for details.