One way to compute the similarity of data is as follows: 1. turn each file into a binary image (a .bin file should work) 2. compute the DFFT of the file using a fixed 'window' 3. compute the CoG of the normalized DFFT of each DFFT, using frequency and amplitude as 2d space 4. sort the results from 3, using the 3-result of the file to be compared with as reference. Maybe I am not very clear in my explanation, ask for more. The algorythm is used in image processing among other things (e.g. image recognition). Also speech recognition and general pattern matching. Such an algorythm should exist somewhere anyway. Step 3 can be replaced by other types of calculations. Peter -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist