On Sun, 22 Jul 2001, Jeff DeMaagd wrote: > ----- Original Message ----- > From: Scott Dattalo > > > > On Sun, 22 Jul 2001, wouter van ooijen & floortje hanneman wrote: > > > > > > > I need a program to count the occurrence of each separate word in a > > > > > document. > > > > > > Using the right tool (tcl, python) this should be a one-banana trick. > Even > > > > Using the right OS it's a free-banana trick: wc > > Last I checked that only made a total count of words, lines & bytes and not > a run-down of how many occurences of each particular word there was. Yes I misread the question. What you really want is: tr " " "\n" < fileofinterest | sort | uniq -ic > fileofcountedwords This takes the fileofinterest and converts all spaces to carriage returns and writes the result to stdout. stdout is piped to sort which will sort all of the words. The output of this is piped into uniq where all duplicates are removed and the number of occurences are printed along with the word. Also, the case is ignored. The results are stored in fileofcountedwords. It might be necessary to remove all of the punctuation prior to running it through the commands. Eg: sed s/[^a-zA-Z\ ]/\ /g < input > output As an example, if you take the above paragraph and put it into a file named j and execute the following command: sed s/[^a-zA-Z\ ]/\ /g < j | tr " " "\n" | sort | uniq -ic | sort -r You get: 9 the 8 5 to 4 of 4 all 3 are 3 and 2 this 2 stdout 2 sort 2 piped 2 it 2 is 1 writes 1 words 1 word 1 with 1 will 1 which 1 where 1 uniq 1 through 1 takes 1 stored 1 spaces 1 running 1 returns 1 results 1 result 1 removed 1 remove 1 punctuation 1 prior 1 printed 1 output 1 occurences 1 number 1 necessary 1 might 1 into 1 in 1 fileofinterest 1 fileofcountedwords 1 Eg 1 duplicates 1 converts 1 commands 1 carriage 1 be 1 along I guess that's a bushel of free bananas! (Note, sed can't substitute a space for a carriage return, AFAIK). Scott -- http://www.piclist.com hint: The PICList is archived three different ways. See http://www.piclist.com/#archives for details.