--- Jan-Erik Soderholm wrote: > Ben Hencke wrote : > > > I would really reccomend that you stay away from hashing at all. > Never > > hash a serial number that already fits into a reasonable ammount of > > space. Hashing is really only useful when trying to take a > > large key of data (ie a string, file, whole record, etc) and make > it into a > > new key that can be easily sorted/indexed/etc. > > A hash index isn't ment to keep records "sorted" at all. A "hash" can be used for all sorts of things including as keys in a sorted array when it is impractical to sort the original key itself. A hash table is sorted with the exception of the collisions. You use a hashing function to reduce the data into a smaller key or index and store the data (or pointers to the data) in a simple table or array. Each element is sorted by its index/hash by virtue of being in an array. You can access any record in one i/o because with its hash value you know where in the array it is. See http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html It has a nice animation too :) IIRC the max simultaneous number of records required was 100,000 or so. A minimum hash table that could handle the required number of records would need a 17bit hash value. Thats still something like a 70% overlap with a perfect hash function. Most hash tables are an array of pointers to records, but for optimization sake, I would group the whole record into the array instead. The OP won't be able to use a flat or rehash collision handling method for a hash table because every deletion would require a rehash of the table. You can optimize the rehash a little with some flags, but it will still be a nightmare wrapped in a bad dream. So linked lists or some other method would have to be used for each record that had a collision further adding to the layers of complexity and storage requirements. > A hash index should be used when you need optimum performance > in retreiving a single record from a table. A properly designed > hash table/index should be able to retreive a specific fully > qualified record in one single I/O. A b-tree needs as many I/O's > as there are currently "levels" in the tree. > > On the other hand, if you need range retreivals (using part of the > key), > a B-tree is far better and a hash index sucks... > > This applies to databases in general, maybe not to the case at hand. > > So there is no "best" here, it, as usual, depends. Agreed :-) > And the size of the key has not much to do with this decision. > I disagree, the size of the key has a lot to do with picking the algorithm to use. If the key was 8 bits, no one would be arguing about what method to use ;-) If the key was 10k of data (maybe a data file or sound sample), you would want to hash that down to something reasonable. > > A card # that is 1 to 10 milion > > already fits nicely into 24 bits. Hashing this small ammount of > data > > will only complicate things. > > Maybe, but a hash it gives the fastest retreival of the record. > A flat array with an element for each of the 10 million cards would give the fastest access as even hash tables need to deal with collisions. A hash table that was sized to fit would have horrible performance when it approaches capacity. You would need significant excess table size to get decent performance while running at capacity. I don't think that fast access times are required. The OP is using this for a vending machine so taking a second to look it up is probably acceptable. It all really depends on the application. I favor the sorted linked list because it is simple to implement and very storage efficient. In this application it is also easy to speed up with a few tweaks. I wouldn't use it for anything that required near real-time lookup. ttyl, Ben Hencke __________________________________ Do you Yahoo!? Yahoo! Photos: High-quality 4x6 digital prints for 25" http://photos.yahoo.com/ph/print_splash -- http://www.piclist.com hint: The PICList is archived three different ways. See http://www.piclist.com/#archives for details.