Data Structures: Hashtables

Hash tables, a.k.a. as hashmaps, are like lists, but with strings for the index,e.g.,

    englishToSpanish['hello']='hola'

They are sets of key-value pairs. In the line above, 'hello' is the key, and 'hola' is the value.

The value can be any type of object, not just a string as in the above example.

Hash Tables in Python: Dictionaries

Python has a built-in hash table data structure called dictionary. You initialize a dictionary with:

    englishToSpanish = {} # or...

    englishToSpanish = {'hello':'hola','goodbye':'adios'}

The latter is a list of key-value pairs, separated by commas. Each key is separated from its value by a colon.

Modifying an existing dictionary

You can add new key-value pairs with statements such as:

    englishToSpanish ['horse']='caballo'

This is something like append in a list.

Retrieving Values

Once you've created a hash table, you can retrieve items by supplying the key you are looking for. So if you want to know the Spanish word for 'horse', you say:

    print englishToSpanish['horse']

Note that hash tables are very much like lists, but the index is a string instead of a number.

Looping through all keys or values: Dictionaries provide the methods 'keys' and 'values' which return lists.

The following prints out all key-value pairs:

    for key in englishToSpanish.keys:
        print key+":"+englishToSpanish[key]

Example Usage

The Django templating system in Google's App Engine uses a Python dictionary to store template values. You set the template values in a controller, and the template rendering code uses the template values to replace 'variables' in the HTML templates. For instance, here's some code you might find in an App Engine controller:

    template_values={'person':person,'message':message,'logouturl':logouturl}
    path = os.path.join(os.path.dirname(__file__),'profile.html')
    self.response.out.write(template.render(path,template_values))

Note that in this example, the keys are all strings, but the first value, for key 'person', is an object of type Person.

In-Class Assignment:

Write a Python program that uses Python's built-in dictionary structure. Your program should:
  • creates a dictionary mapping company names to phone numbers. Put three items in your hash table to begin.
  • Allows the end-user to 1) add new mappings to the table by entering a name and a number, or 2) enter a name to see the corresponding phone number. Within a loop, prompt the end-user to choose operation 1 or 2, then prompt for the necessary inputs, then call the chosen operation.

Algorithm Analysis

Hash tables are one of basic data structures used in programming. Others include arrays, linked lists, hash tables, trees, stacks, and queues. These data structures are either provided as part of the programming language, or as part of a standard library of code provided with the language.

One part of computer science is algorithm analysis-- the study of how efficiently certain operations can be executed. For instance:
    
     Find: if you have a list of names, how many operations does it take to determine if a particular name is in the list?

Such an analysis typically attempts to determine the efficiency in the best, worst, and average cases, and in terms of n, where n is the size of the data set. The analysis is given in Big-O notation, which ignores constants and multipliers and only considers the order of magnitude of the problem, e.g.,

 Finding an element in a list of n items is an O(n) operation as it takes about C*n operations to find any particular element, where C is some
 'insignificant' constant.

Hash Tables and Big-O
Suppose Python didn't provide a Dictionary data structure built-in to the language. We could implement one ourselves by writing a class for each entry, KeyValue, and a class for the hash table itself, HashTable. (Instructor will code this-- see attachment at bottom of this page)

For the instructors, code, What is the Big-O analysis for finding the phone number of some particular person?


Here's a definition of hash table from Wikipedia: 

In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number). It works by transforming the key using a hash function into a hash, a number that is used as an index in an array to locate the desired location ("bucket") where the values should be.

So hash tables use a trick: they convert the key into an integer, which can then be used to index into a list. This is significantly more efficient then looking at every item in a list to find something.

    def hash(self,key):
        # loop through the characters of the key
        #     convert the character to a number  (use ord(c))
        #     add this number to running total
        # return (total modulo sizeOfHashTable)   modulo is '%' in Python.

Discussion

The ASCII number for 'a' is 97. Given this, what would the key 'abc' hash to if your hash table was of size 10?

What is the Big-O analysis of hash table lookup when you use a hash function?

What if two keys 'hash' to the same integer, e.g., what if you had keys 'abc' and 'cab'?

In-Class Assignment II

Implement your own hash table ( (Yes, I know, Python already has dictionaries built-in, but its a very instructive exercise). The user of your class will have to call functions instead of using the Dictionary syntax, e.g.,

    htable =HashTable()
    htable.set('horse','caballo')
    val = htable.get('horse')


Your code will be structured like this:

    class KeyValue:
        def __init__(self,key,value):
            self.key=key
            self.value=value
        
    class HashTable:
        def __init__(self)
            self.list=[0,0,0,0,0,0,0,0,0,0]  # self.list=[0]*10 will also work
        def get(self,key):
            # call self.hash(key) to get index
            # find the correct key at that bucket
        def set(self,key,value):
            # call self.hash(key) to get index
            # insert new key-value pair to that bucket.

        def hash(self,key):
            # see above explanation of a hash function.

1. First get your code to work WITHOUT handling collisions-- just store your key-value at the index returned from the hash function, even if that overwrites existing data.
2. Once (1) works, modify your code so that each list element is itself a list (a bucket) of all key-value pairs that hash to that index.


ċ
hasht.py
(1k)
David Wolber,
Nov 21, 2008, 2:31 PM
Comments