MurmurHash


By Austin Appleby (aappleby (AT) gmail)

Update March 1, 2011 - The 32-bit variant of MurmurHash3 is finalized, and this page is deprecated.

Update January 25, 2011 - I'm now employed by Google, and will be working on better cross-platform support for SMHasher and Murmur in the near future.

Update November 4, 2010 - MurmurHash3 has been released in beta form here - http://code.google.com/p/smhasher/ - (I reserve the right to tweak the constants after people have had a chance to bang on it). Murmur3 has better performance than MurmurHash2, no repetition flaw, comes in 32/64/128-bit versions for both x86 and x64 platforms, and the 128-bit x64 version is blazing fast - over 5 gigabytes per second on my 3 gigahertz Core 2.

In addition, the library of test code that I use to test MurmurHash (called SMHasher) has been released - it's still rough (and will only compile under VC++ at the moment), but it contains everything needed to verify hash functions of arbitrary output bit-lengths.

Murmur3 and all future versions will be hosted on Google Code here - http://code.google.com/p/smhasher/ - you can access the codebase via the 'Source' tab at the top.

Update October 28, 2010 - A couple of people have reported a flaw in MurmurHash2, and I've confirmed it - repetitions of 4-byte patterns have a much higher chance of collision than expected. This is not fixable, but should not cause problems in most cases (it applies only to keys that are identical except for the repeating sections and with those sections 4-byte-aligned). While investigating the flaw I've found a new mix function that improves the performance yet again, so I'll be publishing that as MurmurHash3.

Update November 16, 2009 - One of MurmurHash2A's users found a small bug in the sample C++ implementation that caused the C++ and C versions to produce different hashes for blocks whose size was not a multiple of four. Bug fixed, code updated. 


Extremely simple - compiles down to ~52 instructions on x86.

Excellent distribution - Passes chi-squared tests for practically all keysets & bucket sizes.

Excellent avalanche behavior - Maximum bias is under 0.5%.

Excellent collision resistance - Passes Bob Jenkin's frog.c torture-test. No collisions possible for 4-byte keys, no small (1- to 7-bit) differentials.

Excellent performance - measured on an Intel Core 2 Duo @ 2.4 ghz

    OneAtATime - 354.163715 mb/sec
    FNV - 443.668038 mb/sec
    SuperFastHash - 985.335173 mb/sec
    lookup3 - 988.080652 mb/sec
    MurmurHash 1.0 - 1363.293480 mb/sec
    MurmurHash 2.0 - 2056.885653 mb/sec

Source code
    for the simple implementation - MurmurHash2.cpp
    for the simple implementation w/ Merkle-Damgard-esque construction - MurmurHash2A.cpp
    for the 64-bit version - MurmurHash2_64.cpp
    for the little-endian aligned-read-only implementation - MurmurHashAligned2.cpp
    for the (slower) endian-neutral implementation - MurmurHashNeutral2.cpp

All code is released to the public domain. For business purposes, Murmurhash is
under the MIT license.

If you want MurmurHash 1.0, the source is still available - simple and aligned

All detailed testing results have been moved to the Statistics page.

The name, if you're wondering, comes from the simplest sequence of operations which will thoroughly mix the bits of a value - "x *= m; x = rotate_left(x,r);" - multiply and rotate. Repeat that about 15 times using 'good' values of m and r, and x will end up pseudo-randomized. Unfortunately multiply+rotate has a few major weaknesses when used in a hash function, so I used multiply+shift+xor. I liked the name Murmur better than Musxmusx, so I kept it.

I've added a random brain-dump discussion page, where I'll put more info about how I found the hash constants, test methodology, etc.