In computer science, a Ideally the hash function should map each possible key to a different slot index, but this ideal is rarely achievable in practice. Most hash table designs assume that hash collisions — pairs of different keys with the same hash values — are normal occurrences and must be accommodated in some way. In a well-dimensioned hash table, the average cost (number of instructions) for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key-value pairs, at constant average (indeed,amortized) cost per operation. In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches and sets. Hash tables should not be confused with the hash lists and hash trees used in cryptography and data transmission.
Hash functionAt the heart of the hash table algorithm is a simple array of items; this is often simply called the
hash table. Hash table algorithms calculate an index from the data item's key and use this index to place the data into the array. The implementation of this calculation is the hash function, f: index = f(key,arrayLength)The hash function calculates an
index within the array from the data key. arrayLength is the size of the array.Choosing a good hash functionA good hash function and implementation algorithm are essential for good hash table performance, but may be difficult to achieve
^{. }Poor hashing usually degrades hash table performance by a constant factor but hashing is often only a small part of the overall computation.A basic requirement is that the function should provide a uniform distribution of hash values. A non-uniform distribution increases the number of collisions, and the cost of resolving them. Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically by the chi-squared test.
The distribution needs to be uniform only for the table sizes
s that will occur in the application. In particular, if one uses dynamic resizing with exact doubling and halving of s, the hash function needs to be uniform only when s is a power of two. On the other hand, some hashing algorithms provide uniform hashes only when s is a prime number.For open addressing schemes, the hash function should also avoid Cryptographic hash functions are believed to provide good hash functions for any table size Some authors claim that good hash functions should have the avalanche effect; that is, a single-bit change in the input key should affect, on average, half the bits in the output. Some popular hash functions fail to pass this test.
Perfect hash functionIf all of the keys that will be used are known ahead of time, a perfect hash function can be used to create a perfect hash table, in which there will be no collisions. If minimal perfect hashing is used, every location in the hash table can be used as well. Perfect hashing allows for constant time lookups in the worst case. This is in contrast to most chaining and open addressing methods, where the time for lookup is low on average, but may be very large (proportional to the number of entries) for some sets of keys. |

Algorithms > Hashing >