-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, Jun 09, 2007 at 05:06:23PM -0300, Gerhard Fiedler wrote:
> I'm using Powermarks for a few years now (on Windows,
> <http://www.kaylon.com/power.html>). 
> 
> For me, that's how bookmarks should work. When I store a bookmark, I add an
> ad hoc list of keywords (you may call them "tags" :). When searching for
> something, I just start typing what I think I might have added as keyword,
> and it filters the whole bookmark list as I type. 

Yup, exactly what I'm thinking too. Of course, I'd do it with command
line software, after using Linux for a good 7 years, windows shareware
looks almost quaint!

Actually, a neat feature would be to have the bookmark program
automatically cache some of the page too, so not only could you search
for tags, you could search for content too if you find your tags weren't
quite up to snuff. Storage is cheap and it'd be a simple matter to grab
and turn the pages into keywords. 

Indeed, such a system could be usefull for multiple individuals, so
running a centrally accessible server would be usefull. It could even
automatically find new and hopefully interesting urls, so users wouldn't
have to find them in the first place...

> Email could be similar, as could be file storage -- as could be /any/ type
> of storage, actually.
>
> Hierarchies just don't work for most things. Programs simply force their
> users into hierarchical storage schemes because they don't need a lot of
> thinking to implement :)  Hierarchical file systems are complete nonsense.
> They had their time, like when Unix ran on the equivalent of a Z80 and
> maximizing efficiency was necessary to be able to just store something, but
> that's long over. It's a complete pain to have to decide whether an invoice
> belongs to the client dir, the project dir, the accounting dir, the tax
> dir, and so on. 
> 
> A reasonably structured system of tags (like GUIDs for the stuff that needs
> to be predictably found) together with complete free-form tags and a good
> search engine is what's needed.

Well there are a few ways to identify information. Hashes representing
exact content, UUID's (GUIDs are a microsoft specific term) to attach
arbitrary unique identifiers, and paths and tags.

Hashes are unique with respect to content.

UUID's are unique with respect to logic.

And paths and tags... it all depends.

It wouldn't be a big deal to make a universal indexing backend that tied
into the regular file system. MIME types exist to make it easy to filter
when you want to only search for application/email or
application/bookmark (made up examples) Everything could tie into a
UUID, path, or hash as appropriate.

A transition mechanism then would be to normally use paths as the
Uniform Resource Location but slowly transition to UUID or hashes as
appropriate.

For a really simple version email and bookmarks could be indexed by the
same mechanism.

Emails are immutable, and therefore indexed by hash. (you'll want a hash
- -> path database to find the real file)

So an entry would go like this:

URL: sha1sum 3b3c4cbe9c2812e1fd8597db2bc4341fb1c4f6e6
MIME: application/email
Tags: pile of tags, user defined
Content-Keywords: strip out every possible content keyword

For the bookmark, it is mutable, so store by UUID to allow updating.
Again, a UUID -> path database may be needed, but is problematic if
someone changes the path structure.

URL: uuid eee15527-d6eb-490b-99b4-d705d4032fc9
MIME: application/bookmark
Tags: again, user defined
Content-Keywords: strip keywords from the website

The actual url data isn't handled at that layer, as far as the database
knows, a bookmark is like any other bit of data.

Finally, lets try an mp3:

URL: sha1sum 7cbe1187d09e503316bf2858bae29f75788bb92f
MIME: application/music (mp3, why mp3? could be ogg etc.)
Tags: user defined
Content-Keywords: hmm...

In this case you get the decision as to where does the obvious band,
album song data go. Content-Keywords? Hard to say.


Anyway, lots of engineering decisions. But it's the sort of system that
could provide an eventual upgrade path to filesystems that actually do
this kinda thing natively. I mean, the path in a sense, is just a
special kind of tag. In some scenarios the hash would be enough to find
files as well, some types of filesharing networks are effectively
content addressable filesystems for instance. As are some revision
control systems, monotone for example.

The key thing is to make sure the base filesystem layer doesn't really
have to know all that much, and keep the things name spaced in usefull
ways, mime types may be a good model to start with.

- -- 
http://petertodd.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGbVB/3bMhDbI9xWQRAlzJAKCCeINkL3H8uodVC0jBsHMRsZK3ywCfdpng
iwZwkv1evka0YC2I9ElRaJg=
=4TtI
-----END PGP SIGNATURE-----
-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist