One way to compute the similarity of data is as follows:

1. turn each file into a binary image (a .bin file should work)
2. compute the DFFT of the file using a fixed 'window'
3. compute the CoG of the normalized DFFT of each DFFT, using frequency 
and amplitude as 2d space
4. sort the results from 3, using the 3-result of the file to be 
compared with as reference.

Maybe I am not very clear in my explanation, ask for more. The algorythm 
is used in image processing among other things (e.g. image recognition). 
Also speech recognition and general pattern matching. Such an algorythm 
should exist somewhere anyway. Step 3 can be replaced by other types of 
calculations.

Peter

-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist