Our Discord: https://discord.gg/ajezQHdnRU. Please email mathteca@gmail.com if any organization or society wants to partner with Mathteca.
Hash table is a data structure that stores data in the form of "key-value". The so-called storage of data in the form of "key-value" means that any key value uniquely corresponds to a certain location in the memory. Just enter the search key value and you can quickly find its corresponding value. A hash table can be understood as a high-level array. The subscripts of this array can be large integers, floating point numbers, strings or even structures.
In order to make the key value correspond to the location in the memory, it is necessary to calculate the index for the key value, that is, to calculate where the data should be placed. This function that calculates the index based on the key value is called a hash function, also known as a hash function. For example, if the key value is a person's ID card number, the hash function can be the last four digits of the number, or of course the first four digits of the number. The commonly used "mobile phone number" in daily life is also a hash function. In actual applications, the key values may be more complex things, such as floating point numbers, strings, structures, etc. In this case, an appropriate hash function must be designed according to the specific situation. The hash function should be easy to calculate and try to make the calculated index evenly distributed.
After calculating the index for key, we can know where the value corresponding to each key value should be placed. Suppose we use array a to store data and the hash function is f, then the key-value pair (key, value) should be placed on a[f(key)]. No matter what type the key value is or how big the range is, f(key) is an integer within the acceptable range and can be used as the subscript of the array.
If the index calculated by the hash function is different for any key value, then just put (key, value) into the corresponding position according to the index. But in fact, there are often two different key values, and their indexes calculated by the hash function are the same. At this time, some methods are needed to deal with conflicts. In OI, the most commonly used method is the zipper method.
The zipper method is also called open hashing.
The zipper method is to open a linked list in each place where data is stored. If there are multiple key values indexed to the same place, just put them all into the linked list at that location. When querying, you need to scan the entire linked list at the corresponding position, and compare the key value of each data to see if it is consistent with the key value of the query. If the range of the index is 1, ... ,M and the size of the hash table is N, then an insert/query needs to do the desired O(N/M) comparisons.
Implementation
C++
const int SIZE = 1000000;
const int M = 999997;
struct HashTable {
struct Node {
int next, value, key;
} data[SIZE];
int head[M], size;
int f(int key) { return (key % M + M) % M; }
int get(int key) {
for (int p = head[f(key)]; p; p = data[p].next)
if (data[p].key == key) return data[p].value;
return -1;
}
int modify(int key, int value) {
for (int p = head[f(key)]; p; p = data[p].next)
if (data[p].key == key) return data[p].value = value;
}
int add(int key, int value) {
if (get(key) != -1) return -1;
data[++size] = (Node){head[f(key)], value, key};
head[f(key)] = size;
return value;
}
};
Python
M = 999997
SIZE = 1000000
class Node:
def __init__(self, next = None, value = None, key = None):
self.next = next
self.value = value
self.key = key
data = [Node() for _ in range(SIZE)]
head = [0] * M
size = 0
def f(key):
return key % M
def get(key):
p = head[f(key)]
while p:
if data[p].key == key:
return data[p].value
p = data[p].next
return -1
def modify(key, value):
p = head[f(key)]
while p:
if data[p].key == key:
data[p].value = value
return data[p].value
p = data[p].next
def add(key, value):
if get(key) != -1:
return -1
size = size + 1
data[size] = Node(head[f(key)], value, key)
head[f(key)] = size
return value
Here is another encapsulated template, which can be used like map and is shorter.
struct hash_map { // Hash table template
struct data {
long long u;
int v, nex;
}; // Forward star structure
data e[SZ << 1]; // SZ is const int representing size
int h[SZ], cnt;
int hash(long long u) { return (u % SZ + SZ) % SZ; }
// The reason why (u % SZ + SZ) % SZ is used here instead of u % SZ is
// The % operation in C++ cannot convert negative numbers into positive numbers
int& operator[](long long u) {
int hu = hash(u); // Get the head pointer
for (int i = h[hu]; i; i = e[i].nex)
if (e[i].u == u) return e[i].v;
return e[++cnt] = (data){u, -1, h[hu]}, h[hu] = cnt, e[cnt].v;
}
hash_map() {
cnt = 0;
memset(h, 0, sizeof(h));
}
};
Here, the hash function is designed for the type of key value and returns a linked list head pointer for query. In this template we write a hash table with key-value pairs of type (long long, int), and return -1 when querying for non-existent key values. The function hash_map() is used for initialization at definition time.
Closed Hashing
The closed hashing method stores all records directly in the hash table, and if a conflict occurs, exploration continues according to some method.
For example, the linear detection method: if a conflict occurs at d, check d + 1, d + 2...
const int N = 360007; // N is the maximum number of elements that can be stored
class Hash {
private:
int keys[N];
int values[N];
public:
Hash() { memset(values, 0, sizeof(values)); }
int& operator[](int n) {
// Return a reference to the corresponding Hash[Key]
// Modify to a value other than 0. When 0, it is considered empty.
int idx = (n % N + N) % N, cnt = 1;
while (keys[idx] != n && values[idx] != 0) {
idx = (idx + cnt * cnt) % N;
cnt += 1;
}
keys[idx] = n;
return values[idx];
}
};
Non Repetitive Number: https://www.luogu.com.cn/problem/P4305 (Luogu)