Hi Brendan wow - that is going to take some time ;) Can you give us some examples of the data? how many lines are there? Apart from using MS SQL Server and loading your data into the database, I would be looking at writing your own sorting program. You are never going to be able to do it all in one go, there just isn't enough memory in any reasonable computer :) Perhaps an approach where small batches are worked on one at a time, like: Phase 1: Pre sort Read the data file, and work on manageable chunks one at a time. For instance, take 10000 lines, sort in memory then write to a new file. Creating a new file each time. Phase 2: Merging Open a number of the pre-sorted files, and merge the data. You know that they will be in order in a file, so you can read lines from one file until you get to a string that is "greater" then that from other files. then you start reading that file etc etc.. Do this with say 10 files at a time and output a second batch of temp files, then merge them, then their results, eventually you'll get one file.. Now your going to have LOTS of fun with file access and buffering, but at least this way you don't need vast amounts of memory to get the job done. Some hints: - allocate your memory used for sorting ONCE and reuse it, memory allocation is expensive - file access is slow, let the OS do buffering or better yet, do your own and issue reads and writes in blocks to and from memory. If you want to get really efficient, use IO Completion Ports in Win32 to schedule file IO in the background so your processing threads can be busy sorting data, not waiting on file io. Contact me off list if you want to talk about it more. Cheers, Ash. --- Ashley Roll Digital Nemesis Pty Ltd www.digitalnemesis.com Mobile: +61 (0)417 705 718 > -----Original Message----- > [mailto:PICLIST@MITVMA.MIT.EDU]On Behalf Of Brendan Moran > Sent: Thursday, 28 November 2002 10:31 AM > Subject: [OT]:Sorting large files > > > Does anyone know of an algorithm for sorting extremely large files of > linebreak-separated, case-sensitive ASCII values? > > I have a file that is 3,480,231kB of unsorted ASCII values, > and I need it sorted... and, yes, I do have some idea of the time that this > will take. -- http://www.piclist.com hint: To leave the PICList mailto:piclist-unsubscribe-request@mitvma.mit.edu