Statistics

Test results updated March 18, 2008

All results generated on an Intel Core 2 Duo @ 2.4ghz.

MurmurHash 2.0 test results

MurmurHashAligned 2.0 test results

Some explanation is in order -

Bulk Speed Tests are done 9999 times each on 256k blocks, and I report the fastest result out of all the trials. The large number of trials are required so that we have a good chance of getting a "clean" test on a machine that may have large numbers of background processes running that could interrupt the test and change the results. The block size was chosen to fit in most all L2 caches and as a compromise between larger blocks which would show higher bulk speed but whose tests would be more likely to be interrupted versus smaller blocks whose tests would show an increasing amount of testing overhead.

Differential Tests indicate weak points in the hashing algorithm. Differentials are listed by bit mask and percentage, with the percentage indicating the probability of two keys that differ only by that bit mask colliding. For example, if a particular mask was 01100 and the percentage was 50%, then any pair of keys differing in those bits (11111 and 10011, 00000 and 01100, 00011 and 01111, etcetera) have a 50% chance of producing the same hash value. SuperFastHash in particular has a large number of differentials, and their frequency is reflected in the list of dictionary collisions - many word pairs that differ only in a few characters end up colliding.

Dictionary Speed Tests are repeated 999 times for similar reasons as the bulk test. In addition, I compute and subtract out the overhead of iterating over all 470k words in the dictionary and dispatching a function for each word. I do this to try and isolate the time spent inside the hash function itself - In the real world a hash function would likely be inlined and called using some data the CPU has already pulled out of an array somewhere, so by removing those sources of overhead I produce a better overall comparsion.

Chi Deviation represents how far a distribution is from ideally random. A chi deviation of 1.0 indicates perfectly random behavior, higher numbers indicate poorer distribution. Zero would indicate that all buckets in the hash table contain exactly the same number of items, which isn't possible with a "real" hash function but might happen in certain unusual cases. I calculate the chi deviation by computing the chi-square value for a distribution and dividing by the number of buckets - this doesn't map directly to standard deviation or variance, but has the nice property that it produces consistent results for different hash table sizes.

The chi-square test is the most sensitive test and requires a large number of trials to produce a meaningful result, but it will identify distributions that differ even slightly from perfectly random. In order to generate different trials without changing the keyset, we pass a different seed from 0 to 99 to the hash function in each trial.

Expected work deviation represents the difference in the expected average number of cell checks needed to find a key when compared with a perfectly random distribution - a work score of +25.0% indicates that hash table lookups for that combination of hash function / keyset / table size will take roughly 25% longer than expected.

The expected work is the least sensitive test - even statistically poor distributions can produce acceptible work scores. It's also one of the most practically useful - suppose I'm using hash A in my application and it has a work score of +10% for my keyset. If hash B has a work score of +0.1%, it's a safe bet to say that using hash B will produce improved performance in my app. On the other hand, if hash A has a work score of 0.1% on my keyset, even if it's a statistically poor hash I probably won't get much additional performance by changing it.

Avalanche Diagrams represent the effect that flipping one bit in a key has on the output of the hash function. I've generated a number of them just for fun, you can find them here.