> Hopefully the obsessed photographers here Ahhh, so OP is obsessed photographer not original poster. :-) > Summary: Seeking opinions re duplicate-file location and management > software. Needs to work with essentially unlimited capacity and number > of files and number of connected or disconnected drives. "Actually > working" is highly desirable. Free would be nice but is not essential. > Emphasis is on photo files. I'm very interested in solving that same problem. I have various scripts (messy, keyed to my idiosyncrasies) and ways of (trying to) ensure that I have at least 2 copies of each original photo on seperate devices. I'm looking for more of an archive manager. I also have ideas & an outline for a program to deal with it but it's nowhere near completion (and unfunded development). Biggest stumbling block is coming up with a robust signature mechanism to identify nearly-identical or derivative works -- e.g. differentiate between a JPEG that's been recompressed (i.e. identical) versus one that has been edited (i.e. new work). Matching bit-wise identical files is not trivial (partly due to file sizes) but not too difficult either. It also has to take care of files that moved between directory tree A and directory tree B. I think I've solved both of these parts. > I have a large photo collection scattered across many hard drives of > various capacity and vintage. There are also DVD and CD backup copies > although (wisely or not) in recent years I've tended to use multiple > HDDs rather than DVDs for backup. The older the files the more likely > that there are numerous "lost" copies. Wow, that exactly describes my environment too. :-) > All copies of a file that matter will share the original EXIF > information. Original date/time is preserved to the maximum extent > possible**. (Some copying or editing processes* destroy EXIF Identical EXIF is likely but not always true (more below). I think in terms of 2 categories -- archive and work. Original images (now Raw, formerly JPEG) are always archive. (They _are_ my digital negatives.) Proofs sent to clients are work files. Final, printer ready files are archive. Reduced resolution or alternate crops, usually client-approval samples, are usually work files but may be archive class. > It is not uncommon for Windows itself to not be able to properly > handle the time/date format of files coming from eg cameras or > flash cards and to move the time 12 hours or 1 day or swap > day/month or play other games. In such cases the EXIF is usually > untouched and [...] subsequently restore the correct values). Is the correct time correct? One niggling problem is air travel. Well, actually, it's my inability to always perfectly execute a pre-planned process. I keep the camera on the departure time zone through the flight (arbitrary decision, use as convention). Upon landing, I change the camera's time stamp to match current local time. I've found doing that makes it much easier to correlate photos with notes in my time planner, receipts, planned events, etc -- helps me to identify a photo's where, what, and (sometimes) why. Sometimes I forget to update one or all camera bodies on landing. Or I'm just too busy taking photos during deplanning and in the airport (or talking to security because I'm taking photos) -- and then I forget. Having to keep a "time adjustment" for a batch of pictures is another issue that I'd like to address. But if I edit the EXIF time stamps, then the "original" and the "adjusted original" photo are identical but not bit-wise identical. It goes back to a mechanism to compute a unique signature. In no case am I willing to adjust the EXIF time stamp before I upload the original capture files from the camera cards to other media (onto 2 devices or 1 device & a backup DVD). DVD backups, obviously, can't be "fixed" if I update the EXIF time stamps. And, of course, my photo rate goes up by a multipler (circa 10X) when I'm traveling -- likely a direct correlation between number of time zones changed and number of extra photos taken. > Total originals are probably in the 100,000 - 1 million range, I'm at the lower part of that range. You're crazy... well, crazier. :-) Lee Jones -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist