--- Jan-Erik Soderholm <jan-erik.soderholm@TELIA.COM> wrote:
> Ben Hencke wrote :
>
> > I would really reccomend that you stay away from hashing at all.
> Never
> > hash a serial number that already fits into a reasonable ammount of
> > space. Hashing is really only useful when trying to take a
> > large key of data (ie a string, file, whole record, etc) and make
> it into a
> > new key that can be easily sorted/indexed/etc.
>
> A hash index isn't ment to keep records "sorted" at all.

A "hash" can be used for all sorts of things including as keys in a
sorted array when it is impractical to sort the original key itself.

A hash table is sorted with the exception of the collisions. You use a
hashing function to reduce the data into a smaller key or index and
store the data (or pointers to the data) in a simple table or array.
Each element is sorted by its index/hash by virtue of being in an
array. You can access any record in one i/o because with its hash value
you know where in the array it is.

See
http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html
It has a nice animation too :)

IIRC the max simultaneous number of records required was 100,000 or so.
A minimum hash table that could handle the required number of records
would need a 17bit hash value. Thats still something like a 70% overlap
with a perfect hash function.

Most hash tables are an array of pointers to records, but for
optimization sake, I would group the whole record into the array
instead.

The OP won't be able to use a flat or rehash collision handling method
for a hash table because every deletion would require a rehash of the
table. You can optimize the rehash a little with some flags, but it
will still be a nightmare wrapped in a bad dream. So linked lists or
some other method would have to be used for each record that had a
collision further adding to the layers of complexity and storage
requirements.


> A hash index should be used when you need optimum performance
> in retreiving a single record from a table. A properly designed
> hash table/index should be able to retreive a specific fully
> qualified record in one single I/O. A b-tree needs as many I/O's
> as there are currently "levels" in the tree.
>
> On the other hand, if you need range retreivals (using part of the
> key),
> a B-tree is far better and a hash index sucks...
>
> This applies to databases in general, maybe not to the case at hand.
>
> So there is no "best" here, it, as usual, depends.

Agreed :-)

> And the size of the key has not much to do with this decision.
>

I disagree, the size of the key has a lot to do with picking the
algorithm to use. If the key was 8 bits, no one would be arguing about
what method to use ;-) If the key was 10k of data (maybe a data file or
sound sample), you would want to hash that down to something
reasonable.

> > A card # that is 1 to 10 milion
> > already fits nicely into 24 bits. Hashing this small ammount of
> data
> > will only complicate things.
>
> Maybe, but a hash it gives the fastest retreival of the record.
>

A flat array with an element for each of the 10 million cards would
give the fastest access as even hash tables need to deal with
collisions. A hash table that was sized to fit would have horrible
performance when it approaches capacity. You would need significant
excess table size to get decent performance while running at capacity.

I don't think that fast access times are required. The OP is using this
for a vending machine so taking a second to look it up is probably
acceptable.

It all really depends on the application. I favor the sorted linked
list because it is simple to implement and very storage efficient. In
this application it is also easy to speed up with a few tweaks. I
wouldn't use it for anything that required near real-time lookup.

ttyl,
  Ben Hencke




__________________________________
Do you Yahoo!?
Yahoo! Photos: High-quality 4x6 digital prints for 25"
http://photos.yahoo.com/ph/print_splash

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.