Our Discord: https://discord.gg/ajezQHdnRU. Please email mathteca@gmail.com if any organization or society wants to partner with Mathteca.

Hash Table

Hash table is a data structure that stores data in the form of "key-value". The so-called storage of data in the form of "key-value" means that any key value uniquely corresponds to a certain location in the memory. Just enter the search key value and you can quickly find its corresponding value. A hash table can be understood as a high-level array. The subscripts of this array can be large integers, floating point numbers, strings or even structures.

Hash Function

In order to make the key value correspond to the location in the memory, it is necessary to calculate the index for the key value, that is, to calculate where the data should be placed. This function that calculates the index based on the key value is called a hash function, also known as a hash function. For example, if the key value is a person's ID card number, the hash function can be the last four digits of the number, or of course the first four digits of the number. The commonly used "mobile phone number" in daily life is also a hash function. In actual applications, the key values may be more complex things, such as floating point numbers, strings, structures, etc. In this case, an appropriate hash function must be designed according to the specific situation. The hash function should be easy to calculate and try to make the calculated index evenly distributed.

After calculating the index for key, we can know where the value corresponding to each key value should be placed. Suppose we use array a to store data and the hash function is f, then the key-value pair (key, value) should be placed on a[f(key)]. No matter what type the key value is or how big the range is, f(key) is an integer within the acceptable range and can be used as the subscript of the array.

Hash Collision

If the index calculated by the hash function is different for any key value, then just put (key, value) into the corresponding position according to the index. But in fact, there are often two different key values, and their indexes calculated by the hash function are the same. At this time, some methods are needed to deal with conflicts. In OI, the most commonly used method is the zipper method.

The zipper method is also called open hashing.

The zipper method is to open a linked list in each place where data is stored. If there are multiple key values indexed to the same place, just put them all into the linked list at that location. When querying, you need to scan the entire linked list at the corresponding position, and compare the key value of each data to see if it is consistent with the key value of the query. If the range of the index is 1, ... ,M and the size of the hash table is N, then an insert/query needs to do the desired O(N/M) comparisons.

Implementation

C++

const int SIZE = 1000000;

const int M = 999997;

struct HashTable {

struct Node {

int next, value, key;

} data[SIZE];

int head[M], size;

int f(int key) { return (key % M + M) % M; }

int get(int key) {

for (int p = head[f(key)]; p; p = data[p].next)

if (data[p].key == key) return data[p].value;

return -1;

}

int modify(int key, int value) {

for (int p = head[f(key)]; p; p = data[p].next)

if (data[p].key == key) return data[p].value = value;

}

int add(int key, int value) {

if (get(key) != -1) return -1;

data[++size] = (Node){head[f(key)], value, key};

head[f(key)] = size;

return value;

}

};

Python

M = 999997

SIZE = 1000000

class Node:

def __init__(self, next = None, value = None, key = None):

self.next = next

self.value = value

self.key = key

data = [Node() for _ in range(SIZE)]

head = [0] * M

size = 0

def f(key):

return key % M

def get(key):

p = head[f(key)]

while p:

if data[p].key == key:

return data[p].value

p = data[p].next

return -1

def modify(key, value):

p = head[f(key)]

while p:

if data[p].key == key:

data[p].value = value

return data[p].value

p = data[p].next

def add(key, value):

if get(key) != -1:

return -1

size = size + 1

data[size] = Node(head[f(key)], value, key)

head[f(key)] = size

return value

Here is another encapsulated template, which can be used like map and is shorter.

struct hash_map { // Hash table template

struct data {

long long u;

int v, nex;

}; // Forward star structure

data e[SZ << 1]; // SZ is const int representing size

int h[SZ], cnt;

int hash(long long u) { return (u % SZ + SZ) % SZ; }

// The reason why (u % SZ + SZ) % SZ is used here instead of u % SZ is

// The % operation in C++ cannot convert negative numbers into positive numbers

int& operator[](long long u) {

int hu = hash(u); // Get the head pointer

for (int i = h[hu]; i; i = e[i].nex)

if (e[i].u == u) return e[i].v;

return e[++cnt] = (data){u, -1, h[hu]}, h[hu] = cnt, e[cnt].v;

}

hash_map() {

cnt = 0;

memset(h, 0, sizeof(h));

}

};

Here, the hash function is designed for the type of key value and returns a linked list head pointer for query. In this template we write a hash table with key-value pairs of type (long long, int), and return -1 when querying for non-existent key values. The function hash_map() is used for initialization at definition time.

Closed Hashing

The closed hashing method stores all records directly in the hash table, and if a conflict occurs, exploration continues according to some method.

For example, the linear detection method: if a conflict occurs at d, check d + 1, d + 2...

const int N = 360007; // N is the maximum number of elements that can be stored

class Hash {

private:

int keys[N];

int values[N];

public:

Hash() { memset(values, 0, sizeof(values)); }

int& operator[](int n) {

// Return a reference to the corresponding Hash[Key]

// Modify to a value other than 0. When 0, it is considered empty.

int idx = (n % N + N) % N, cnt = 1;

while (keys[idx] != n && values[idx] != 0) {

idx = (idx + cnt * cnt) % N;

cnt += 1;

}

keys[idx] = n;

return values[idx];

}

};

Exercise

Non Repetitive Number: https://www.luogu.com.cn/problem/P4305 (Luogu)

Page updated

Google Sites

Report abuse