patchmap

Pseudorandom Ordering

for a Memory Efficient Hash Table

Wolfgang Brehm

Sorted arrays are most memory efficient

The most memory efficient data structure of storing and accessing uniformly distributed integers is a sorted array, then using interpolation search to retrieve the values in log₂(log₂(n)). If there is more memory available some spots can be left empty to have a collection of patches that are internally sorted and with respect to each other. When looking for an element the search essentially only has to take place within one patch, as the correct patch will be found in the first step, this leads to a complexity independent of the number of elements stored in the hash table and becomes a function of the load factor, the ratio of available spots, so called buckets, and the number of elements stored in the table.

But my keys are not uniform

Now this is all good but what if the keys we want to store are not uniform? This is where patchmap comes in. By sorting the keys by their hash value the order becomes pseudorandom. If the hash function is good then the values that it produces will be uniform even if the keys themselves are not. Therefore by combining a good hash function with a order based on the hash values a data structure can be created that stores keys or key-value pairs in a very memory efficient way and that can linearily interpolate between a (randomly) sorted array and a hash table.

Performance compared to other hash tables

The performance of a hash table has two mayor characteristics, memory efficiency and time needed to retrieve, insert or delete an entry. The hash table with the best performance exhibits the best trade-off between these characteristics.

134217728 random 32-bit integer keys were associated with arbitrary 32-bit integer values and the average memory consumption and time for insertion, lookup and deletion per key-value pair can be found in the plots below for different hash tables.

There are two points for the patchmap - one optimized for memory efficiency and one for speed.

As can be seen from the plots the patchmap can effectively claim a spot on the invisible line that demarcates the apparent best trade-off between memory efficiency and speed and can be tuned to be the best choice for a memory efficiency between around 60% and 80%. Below 60% the overhead of the interpolation search and the more expensive hashing function make the patchmap less competitive and there are many hash tables that are slightly faster. Above 80% there is google::sparse_hash_map looking down on us all high and mighty, let's see what we can do about that.

Average cost for insertion and lookup in the hash map as it is being filled up with 268435456 pseudo-random keys. Insertion (in red) shows a strongly quadratic behavior and the function 1+(α/(1-α)²+α)/4 is indicated with a solid black line. Lookup (in blue) is almost linear in α and shows a steep increase only for load factors close to 1. The theoretically derived complexity bound for lookup 1+log₂(log₂((2-α)/(2-2α))+1) is indicated with dashed black line.

Inserting into a patch

To know which buckets in the table are set and which are not one could reserve a special value like the google implementations do for an empty key. But I have decided to use a binary mask that indicates whether an entry is set or not.

When inserting into the patchmap it goes something like this:

map the hash value of the key to the range of the table
look at the mask to find the closest free bucket to this position
insert the key-value pair there
restore the order by sorting by the hash value

inserting into a patch

Competing with google::sparse_hash_map

The patchmap achieves a good speed/space trade-off, but it cannot beat googles sparse hash map although the comparison to it and sparsepp were not even very fair in the first place as these are the two hash tables that minimize not the average memory consumption but the peak memory consumption. Each time the other hash tables (including the patchmap) have to grow they reserve a new contiguous block of memory, copy the values from the old table, that has become too small, into the new table, therefore taking up at least 2x more memory during each grow operation. To make the comparison fair I first switch from this simple growing scheme to a hashed array tree.

A hashed array tree allows the hash table to grow virtually in place, at some additional cost. By carefully tuning the maximum load factor before expanding and the expansion factor the patchmap can be made to compete with google::sparse_hash_map .

The points for patchmap ● and google::sparse_hash_map ○ overlap for lookup and insertion, but google::sparse_hash_map does faster deletions

Tombstones make things fast but they also create zombies

In this test the patchmap can be tuned to be almost exactly to the same space-time trade-off as the google::sparse_hash_map . The only significant difference exists for deletions. Usually when deleting an element from a hash table elements are not truly removed from the table, they are only marked as dead. This technique is known as tombstoning. It is done mostly because this is the fastest way to delete elements, but also because the other ways often are much harder to implement. This is not the case in the patchmap. Deleting works just like insertion, but in reverse. Zombie entries can accumulate and if we don't pay close attention they can take over the table. Such a worst case scenario can be triggered by repeatedly inserting and deleting arrays while keeping the table at the same size if the implementation has not taken specific precautions to mitigate this issue, usually by rehashing the table as soon as the Zombies accumulate too much.

zombie revenge

patchmap: ●, khash: ×, bytell: +, google::sparse_hash_map: ○, google::dense_hash_map ⬠, flat_hash_map: △, std::unordered_map: ◇, sparsepp: ◻

How to make a good hash function?

I had been thinking about good hash functions for a long time so it was not hard to combine two multiplications and a binary rotation to get a hash function that had a very even output. This is similar to what Malte Skarupke ended up doing starting with a single multiplication with the golden ratio, aka fibonacci hashing and composing it with a bitshift. Galois Field multiplications are a great building block too, but my laptop is a little old and therefore struggles with carry-less multiplications.

The backstory

Recently I wanted to compute a large number of large histograms. The fastest way of computing histograms involves hash tables. As c++ is my favorite programming language I had chosen std::unordered_map as the backend. Everything was going smoothly just as I had imagined but then when I was running the code on my laptop instead of the servers at my institute everything froze. The memory required was around 8 times of what I had anticipated. It turns out I was triggering close to the worst case for the std::unordered_map as I was mapping 16-bit values to 16-bit values on a 64-bit machine. The std::unordered_map is a chaining hash table which means that for each pair of 16-bit values it needs to store a 64-bit pointer, this creates an especially large overhead in this case. I ended up sorting the values first and then computing the histograms from the sorted arrays, which takes much longer, but uses the minimum amount of memory. But I still had more memory available, I was just using 6 GB out of the 16 GB I had in my laptop. Would it not be great if there was a data structure that could interpolate between the two solutions? Or at least if there was a hash table that was more memory efficient?

And before I even found out that there were indeed hash tables that are way more efficient I had an idea that I needed to try.

The idea is as has been hinted at to interpolate between a sorted array and a hash table. That is mapping the keys linearly to the hash table and resolving collisions based on the numerical value of the key should they occur. This works well if the keys are uniformly distributed and in this case has the added effect of essentially sorting the keys in O(1) time and O(1) memory, not in place however as for this to work there is some fractional extra space needed, at least 5% extra would be good.

Comparing to other approaches

This approach might look very similar to the one suggested in "Ordered Hash Tables" (01 January 1974) O. Amble D. E. Knuth but there is a mayor difference in that the hash tables there are ordered first by the hash of the keys and then by the value of the key. If the entries are however only ordered by the hash value there exists a global order, which all of a sudden allows advanced search strategies like interpolation search and binary search.

In fact the approach is much more similar to robin hood hashing with linear probing because if two keys collide at the same position the key whose hash value is lower will also have been displaced more and will therefore displace the key with the larger hash value, therefore keeping up the same order. There has however been no hash table using robin hood hashing with linear probing that takes advantage of this fact.

Availability

https://github.com/1ykos/ordered_patch_map

[ C++17, open source, beta stadium, let me know if you want to use it, I will try to make it work for you! ]

Detailed description

For a more detailed description of the inner workings of the patchmap you can have a look at (open-access):

http://www.dcc.ufla.br/infocomp/index.php/INFOCOMP/article/view/581