Hash

A Hash Table/Hash Map is a data structure that allows us to store key-value pairs. Normally a hash function is applied to the key to produce a value that points to the bucket in the array where we store the pair. Operations such as insert, update, and delete on Hash tables are very efficient and thus the hash tables are very widely used in general. The key and value can be of any type. The bucket term is usually an index into an array to which the hash function maps to. We can think of the hash function as producing an integer value out of another type which may be an integer or a string .

Simple Hash

File: "hash1.cpp"

#include <iostream>using namespace std ;/* destructor for hash1 class .*/class pair1 { public: int key ; int value ; pair1( int keyP , int valueP ) { key = keyP ; value = valueP ; } pair1() { }};class hash1 { public: pair1** holder ; int size ; hash1( int sizeP ) { size = sizeP ; holder = new pair1*[size] ; for( int i1=0 ; i1<size ; i1++ ) { holder[i1] = NULL ;

} //for } ~hash1() { for( int i1=0 ; i1<size ; i1++ ) { if ( holder[i1] != NULL ) delete holder[i1] ;

} //for delete [] holder ; } int hash ( int key1 ) { return ( key1 % size ) ; } bool put( int key, int value ) { //Get hash int hashCode = hash( key ) ; if ( holder[ hashCode ] != NULL ) if ( holder[ hashCode ]->key != key ) return false ; //collision else //do an update { holder[ hashCode ]->value = value ; return true ; } pair1* ptrPair = new pair1(key, value) ; holder[ hashCode ] = ptrPair ; return true ; } bool removeValue( int key ) { //TO DO return true ; } int getValue ( int key ) { //Get hash int hashCode = hash( key ) ; if ( holder[ hashCode] != NULL ) if ( holder[ hashCode ]->key == key ) return holder[ hashCode] -> value ; else { cout << "Key not found." << endl ; return -1 ; } cout << "Key not found." << endl ; return -1 ; } void print() { for( int i1=0 ; i1 < size ; i1++ ) { if ( holder[ i1 ] == NULL ) printf( "%10d %s\n" , i1 , "NULL" ) ; else printf( "%10d pair{ %d, %d }\n" , i1 , holder[ i1 ]->key, holder[ i1 ]->value ) ; } //for }};int main(){ hash1 hash1Obj( 10 ) ; hash1Obj.put( 5 , 34 ) ; hash1Obj.put( 7 , 53 ) ; hash1Obj.put( 70 , 32 ) ; hash1Obj.print() ; //cout << hash1Obj.getValue( 12 ) << endl ; return 1 ;}

We are not taking care of collisions in the above case. There are 2 ways of addressing collisions Open Addressing and Closed Addressing.

In Open Addressing if we find that the slot is occupied then we keep going down the array till we find an empty slot and insert our entry in there. In Open Addressing we have linear and quadratic problng. We shall address linear probing .

Exercise

We have a hash table of size of 7 .

What will be the contents of the buckets after the following entries have been inserted. The hash table will be created by the above program.

Key,Val

0,7

7,0

1,10

3,11

9,12

11,13

12,14

13,15

8,16

19,17

Linear Probing

In linear probing a collision is handled by finding the next empty slot and placing the pair inside it.

File: "hash2.cpp"

#include <iostream>using namespace std ;/* destructor for hash1 class . Open Addressing Linear Probe*/class pair1 { public: int key ; int value ; pair1( int keyP , int valueP ) { key = keyP ; value = valueP ; } pair1() { }};class hash1 { public: pair1** holder ; int size ; hash1( int sizeP ) { size = sizeP ; holder = new pair1*[size] ; for( int i1=0 ; i1<size ; i1++ ) { holder[i1] = NULL ; } //for } int hash ( int key1 ) { return ( key1 % size ) ; } bool put( int key, int value ) { //Get hash int hashCode = hash( key ) ; if ( holder[ hashCode ] != NULL ) if ( holder[ hashCode ]->key != key ) { //Find an empty slot for( int i1= hashCode+1 ; i1 < size ; i1++ ) { //Found an empty slot if ( holder[ i1 ] == NULL ) { pair1* ptrPair = new pair1(key, value) ; holder[ hashCode ] = ptrPair ; return true ; } } return false ; //No empty spaces found } else //do an update { holder[ hashCode ]->value = value ; return true ; } pair1* ptrPair = new pair1(key, value) ; holder[ hashCode ] = ptrPair ; return true ; } bool removeValue( int key ) { //TO DO return true ; } int getValue ( int key ) { //Get hash int hashCode = hash( key ) ; //TO DO return -1 ; }};int main(){ hash1 hash1Obj( 10 ) ; hash1Obj.put( 12 , 24 ) ; hash1Obj.put( 12 , 25 ) ; cout << hash1Obj.getValue( 12 ) << endl ; return 1 ;}

The disadvantage with linear probing is that say we have many keys that map to a hash code of 5. Pretty soon the spots of 6, 7, 8, 9, 10 get filled up. Now we have a key whose hash code is 6 . Since the next few entries are filled up we have to traverse many entries in order to find an empty slot. There is another technique called quadratic probing that overcomes this problem by not storing entries in a contiguous location if there is a collision.

Exercise

1) Complete the functions "getalue" and "removeValue" .

2)

We have a hash table of size of 7 .

What will be the contents of the buckets after the following entries have been inserted. The hash table will be created by the above program.

Key,Val

0,7

7,0

1,10

3,11

9,12

11,13

12,14

13,15

8,16

19,17

Quadratic Probing

File: "hash3.cpp"

For example: Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.

File: "hash3.cpp"

#include <iostream> using namespace std ;/* Open Addressing Quadratic Probe */ class pair1 { public: int key ; int value ; pair1( int keyP , int valueP ) { key = keyP ; value = valueP ; } pair1() { }}; class hash1 { public: pair1** holder ; int size ; hash1( int sizeP ) { size = sizeP ; holder = new pair1*[size] ; for( int i1=0 ; i1<size ; i1++ ) { holder[i1] = NULL ; } //for } int hash ( int key1 ) { return ( key1 % size ) ; } bool put( int key, int value ) { //Get hash int hashCode = hash( key ) ; int quad = 0 ; for( int i1= hashCode+quad ; i1 < size ; i1 = hashCode + quad*quad ) { //Found an empty slot if ( holder[ i1 ] != NULL ) { if ( holder[ i1 ]->key == key ) { holder[ i1 ]->value = value ; return true ; } } else { pair1* ptrPair = new pair1(key, value) ; holder[ i1 ] = ptrPair ; return true ; } quad++ ; } return true ; } bool removeValue( int key ) { //Get hash int hashCode = hash( key ) ; int quad = 0 ; for( int i1= hashCode+quad ; i1 < size ; i1 = hashCode + quad*quad ) { //Found an empty slot if ( holder[ i1 ] != NULL ) { if ( holder[ hashCode ]->key == key ) { delete holder[ hashCode ] ; holder[ hashCode ] = NULL ; return true ; } } quad++ ; } return true ; } int getValue ( int key ) { //Get hash int hashCode = hash( key ) ; int quad = 0 ; for( int i1= hashCode+quad ; i1 < size ; i1 = hashCode + quad*quad ) { //Found an empty slot if ( holder[ i1 ] != NULL ) { if ( holder[ hashCode ]->key == key ) return holder[ hashCode ]->value ; } quad++ ; } //Value not found return -1 ; } void print() { for( int i1=0 ; i1 < size ; i1++ ) { if ( holder[ i1 ] == NULL ) printf( "%10d %s\n" , i1 , "NULL" ) ; else printf( "%10d pair{ %d, %d }\n" , i1 , holder[ i1 ]->key, holder[ i1 ]->value ) ; } //for }}; int main(){ hash1 hash1Obj( 7 ) ; //50, 700, 76, 85, 92, 73, 101. hash1Obj.put( 50 , 50 ) ; hash1Obj.put( 700 , 700 ) ; hash1Obj.put( 76 , 76 ) ; hash1Obj.put( 85 , 85 ) ; hash1Obj.put( 92 , 92 ) ; hash1Obj.put( 73 , 73 ) ; hash1Obj.put( 101 , 101 ) ; hash1Obj.print() ; //cout << hash1Obj.getValue( 12 ) << endl ; return 1 ;}

Exercise:

1)

We have a hash table of size of 7 .

What will be the contents of the buckets after the following entries have been inserted. The hash table will be created by the above program.

Key,Val

0,7

7,0

1,10

3,11

9,12

11,13

12,14

13,15

8,16

19,17

The closed addressing approach resolves the collision problem by using linked lists for each bucket instead of a single entry.

Closed Addressing

In closed addressing we keep a list for each bucket. Every key that maps to the hashcode we place it in the list.

File: "hash4.cpp"

#include <iostream> #include <list> #include <iterator>/* Use auto to simplify things. print function to print our hashmap */ using namespace std ; class pair1 { public: int key ; int value ; pair1( int keyP , int valueP ) { key = keyP ; value = valueP ; } pair1() { }}; class hash1 { public: list <pair1*>** holder ; int size ; hash1( int sizeP ) { size = sizeP ; holder = new list<pair1*>*[size] ; for( int i1=0 ; i1<size ; i1++ ) { holder[i1] = NULL ; } //for } int hash ( int key1 ) { return ( key1 % size ) ; } bool put( int key, int value ) { //Get hash int hashCode = hash( key ) ; if ( holder[ hashCode ] == NULL ) { holder[ hashCode ] = new list<pair1*>() ; pair1* ptrPair = new pair1(key, value) ; holder[ hashCode ]->push_back( ptrPair ) ; } else { //check if key pair already exists list<pair1*> :: iterator it; for(it = holder[ hashCode ]->begin(); it != holder[ hashCode ]->end(); ++it) { if ( key == (*it)->key ) { (*it)->value = value ; return true ; } }//for pair1* ptrPair = new pair1(key, value) ; holder[ hashCode ]->push_back( ptrPair ) ; } return true ; } bool removeValue( int key ) { int hashCode = hash( key ) ; list<pair1*> :: iterator it; for(it = holder[ hashCode ]->begin(); it != holder[ hashCode ]->end(); ++it) { if ( key == (*it)->key ) { holder[ hashCode ]->remove( *it ) ; return true ; } }//for return false ; } int getValue ( int key ) { //Get hash int hashCode = hash( key ) ; if ( holder[ hashCode] != NULL ) { list<pair1*> :: iterator it; for(it = holder[ hashCode ]->begin(); it != holder[ hashCode ]->end(); ++it) { if ( key == (*it)->key ) { return (*it)->value ; } }//for } else return -1 ; return -1 ; } void print() { for( int i1=0 ; i1 < size ; i1++ ) { if ( holder[ i1 ] == NULL ) printf( "%10d %s\n" , i1 , "NULL" ) ; else { list<pair1*> :: iterator it; printf( "%10d " , i1 ) ; for(it = holder[ i1 ]->begin(); it != holder[ i1 ]->end(); ++it) { printf( "pair{ %d, %d } " ,i1, (*it)->key , (*it)->value ) ; }//for printf("\n" ) ; } } //for }}; int main(){ hash1 hash1Obj( 10 ) ; hash1Obj.put( 12 , 24 ) ; hash1Obj.put( 2 , 25 ) ; hash1Obj.put( 3 , 26 ) ; hash1Obj.print() ; //cout << hash1Obj.getValue( 12 ) << endl ; return 1 ;}

Exercise:

1) Write a function "clear" that will clear all the elements in the hash in the above file "hash4.cpp" .

void clear()

{

//TO DO

}

STL

File: "stl1.cpp"

#include <iostream>#include <iterator>#include <map>using namespace std;int main(){ // empty map container map<int, int> map1; // insert elements in random order map1.insert(pair<int, int>(1, 40)); map1.insert(pair<int, int>(2, 30)); map1.insert(pair<int, int>(3, 60)); map1.insert(pair<int, int>(4, 20)); map1.insert(pair<int, int>(5, 50)); cout << map1[1] << endl ; cout << map1[5] << endl ; cout << map1.size() << endl ; map<int, int>::iterator itr; for (itr = map1.begin(); itr != map1.end(); ++itr) { cout << '\t' << itr->first << '\t' << itr->second << '\n'; } cout << endl; map1.erase(3 ) ; cout << map1.size() << endl ; for (itr = map1.begin(); itr != map1.end(); ++itr) { cout << '\t' << itr->first << '\t' << itr->second << '\n'; } cout << endl;}

Exercise :

1) Write the above program to use "string" as the key and value .

Load Factor

To find the order we need to find the steps taken by the hash function and the steps to find our entry.

The hash function will work on a key and the key size can be assumed to be small in comparison to the number of entries n. We can state that the hash will take O(1) . Now if we have n entries the worst case scenario is that we have all the entries in 1 bucket in the closed addressing scheme and the order will be of n . In the best case we have entry and we get O(1) . We want the number of entries to be smaller than the number of buckets. If we have n as the number of entries and b as the number of buckets then we have the :

Load Factor = n / b

This number should be kept small.

An example scheme might be if the load factor goes above .75 then a new hash is created with double the number of buckets and all the pairs from the old map are inserted into the new hash. This is called rehashing.

Hash

C++ STL has a built in function that we can use to hash.

File: "stl2.cpp"

// hash example#include <iostream>#include <functional>#include <string>using namespace std ;int main (){ char nts1[] = "Test"; char nts2[] = "Test"; string str1 (nts1); string str2 (nts2); hash<char*> ptr_hash; hash<string> str_hash; cout << ptr_hash( nts1 ) << endl ; cout << ptr_hash( nts2 ) << endl ; cout << "same hashes:\n" << boolalpha; cout << "nts1 and nts2: " << (ptr_hash(nts1)==ptr_hash(nts2)) << '\n'; cout << "str1 and str2: " << (str_hash(str1)==str_hash(str2)) << '\n'; return 0;}

Output:

$ g++ stl2.cpp ; ./a.exe

6474841

6474836

same hashes:

nts1 and nts2: false

str1 and str2: true

Exercise:

1) Given a c style string; create your own hash function to produce a number. .

File: "stl3.cpp"

// hash example#include <iostream>#include <functional>#include <string>using namespace std ;int myHash( char* str1 ){ printf( "%d\n" , (int)str1 ) ; return( (int)str1 ) ;}int main (){ char nts1[] = "Test"; char nts2[] = "Test"; string str1 (nts1); string str2 (nts2); myHash( nts1 ) ; hash<char*> ptr_hash; hash<string> str_hash; cout << ptr_hash( nts1 ) << endl ; cout << ptr_hash( nts2 ) << endl ; cout << "same hashes:\n" << boolalpha; cout << "nts1 and nts2: " << (ptr_hash(nts1)==ptr_hash(nts2)) << '\n'; cout << "str1 and str2: " << (str_hash(str1)==str_hash(str2)) << '\n'; return 0;}

Hashing and Encryption

https://sites.google.com/site/ajaymittalinstructor/home/encryption