-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, Jun 09, 2007 at 05:06:23PM -0300, Gerhard Fiedler wrote: > I'm using Powermarks for a few years now (on Windows, > ). > > For me, that's how bookmarks should work. When I store a bookmark, I add an > ad hoc list of keywords (you may call them "tags" :). When searching for > something, I just start typing what I think I might have added as keyword, > and it filters the whole bookmark list as I type. Yup, exactly what I'm thinking too. Of course, I'd do it with command line software, after using Linux for a good 7 years, windows shareware looks almost quaint! Actually, a neat feature would be to have the bookmark program automatically cache some of the page too, so not only could you search for tags, you could search for content too if you find your tags weren't quite up to snuff. Storage is cheap and it'd be a simple matter to grab and turn the pages into keywords. Indeed, such a system could be usefull for multiple individuals, so running a centrally accessible server would be usefull. It could even automatically find new and hopefully interesting urls, so users wouldn't have to find them in the first place... > Email could be similar, as could be file storage -- as could be /any/ type > of storage, actually. > > Hierarchies just don't work for most things. Programs simply force their > users into hierarchical storage schemes because they don't need a lot of > thinking to implement :) Hierarchical file systems are complete nonsense. > They had their time, like when Unix ran on the equivalent of a Z80 and > maximizing efficiency was necessary to be able to just store something, but > that's long over. It's a complete pain to have to decide whether an invoice > belongs to the client dir, the project dir, the accounting dir, the tax > dir, and so on. > > A reasonably structured system of tags (like GUIDs for the stuff that needs > to be predictably found) together with complete free-form tags and a good > search engine is what's needed. Well there are a few ways to identify information. Hashes representing exact content, UUID's (GUIDs are a microsoft specific term) to attach arbitrary unique identifiers, and paths and tags. Hashes are unique with respect to content. UUID's are unique with respect to logic. And paths and tags... it all depends. It wouldn't be a big deal to make a universal indexing backend that tied into the regular file system. MIME types exist to make it easy to filter when you want to only search for application/email or application/bookmark (made up examples) Everything could tie into a UUID, path, or hash as appropriate. A transition mechanism then would be to normally use paths as the Uniform Resource Location but slowly transition to UUID or hashes as appropriate. For a really simple version email and bookmarks could be indexed by the same mechanism. Emails are immutable, and therefore indexed by hash. (you'll want a hash - -> path database to find the real file) So an entry would go like this: URL: sha1sum 3b3c4cbe9c2812e1fd8597db2bc4341fb1c4f6e6 MIME: application/email Tags: pile of tags, user defined Content-Keywords: strip out every possible content keyword For the bookmark, it is mutable, so store by UUID to allow updating. Again, a UUID -> path database may be needed, but is problematic if someone changes the path structure. URL: uuid eee15527-d6eb-490b-99b4-d705d4032fc9 MIME: application/bookmark Tags: again, user defined Content-Keywords: strip keywords from the website The actual url data isn't handled at that layer, as far as the database knows, a bookmark is like any other bit of data. Finally, lets try an mp3: URL: sha1sum 7cbe1187d09e503316bf2858bae29f75788bb92f MIME: application/music (mp3, why mp3? could be ogg etc.) Tags: user defined Content-Keywords: hmm... In this case you get the decision as to where does the obvious band, album song data go. Content-Keywords? Hard to say. Anyway, lots of engineering decisions. But it's the sort of system that could provide an eventual upgrade path to filesystems that actually do this kinda thing natively. I mean, the path in a sense, is just a special kind of tag. In some scenarios the hash would be enough to find files as well, some types of filesharing networks are effectively content addressable filesystems for instance. As are some revision control systems, monotone for example. The key thing is to make sure the base filesystem layer doesn't really have to know all that much, and keep the things name spaced in usefull ways, mime types may be a good model to start with. - -- http://petertodd.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGbVB/3bMhDbI9xWQRAlzJAKCCeINkL3H8uodVC0jBsHMRsZK3ywCfdpng iwZwkv1evka0YC2I9ElRaJg= =4TtI -----END PGP SIGNATURE----- -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist