At 09:12 PM 27/11/2002 -0600, you wrote: >Ftp it to a UNIX box and use sort? 8-) Actually, I could have done that, but this XP box has hugely more disk space, a much faster processor and something like double the RAM than the Linux (debian) box that I have available. >Or import into Excel and sort it, if Excel will take it. Or Access. Or >install the Cygwin tools. Or install Perl and write a one-liner. That's >a joke, see, *ALL* Perl programs can be written as one-liners. Cygwin was the solution. It seems it's better not to re-invent the wheel. Gnu sort can already handle this sort of thing. I like having bash around anyway. It's way better than DOS, well... for me. >After the second margarita I find the possibilities are endless. I doubt >any of the M$ Office products will work well on a 3GB file, Most MS tools choke on that kind of file size. DOS sort was what I originally tried, but it turned out to be case insensitive, which was very important in this case. It turns out that the GNU sort that comes with Cygwin was case sensitive, and therefore better. >If you do not care about duplicates you can skip uniq. Actually, duplicates are important. They're why I'm doing this to start with. I'm looking for frequency of occurrences of character combinations. I already wrote the software to generate the character sequences using standard output and standard input, which made the buffering issues of the programme non-existant. The entire sort and frequency count of the largest file took 2hr 43m to complete Not bad, I think. Thanks for all the suggestions, Brendan P.S. Who want's to know what the most common 5-character sequence in the English language is? It is (get this) " and ". That's right. Space, and, space. -- http://www.piclist.com hint: The list server can filter out subtopics (like ads or off topics) for you. See http://www.piclist.com/#topics