Programming‎ > ‎Python‎ > ‎

09-Dictionaries




What is a Collection?


• A collection is nice because we can put more than one value in it and carry them all around in one convenient package
• We have a bunch of values in a single “variable”
• We do this by having more than one place “in” the variable
• We have ways of finding the different places in the variable


A Story of Two Collections..


• List
> A linear collection of values that stay in order

• Dictionary
> A “bag” of values, each with its own label


Dictionaries


• Dictionaries are Python’s most powerful data collection
• Dictionaries allow us to do fast database-like operations in Python
• Dictionaries have different names in different languages
> Associative Arrays - Perl / PHP
> Properties or Map or HashMap - Java
> Property Bag - C# / .Net

• Lists index their entries based on the position in the list
• Dictionaries are like bags - no order
• So we index the things we put in the dictionary with a “lookup tag”

>>> purse = dict()
>>> purse['money'] = 12
>>> purse['candy'] = 3
>>> purse['tissues'] = 75
>>> print purse
{'money': 12, 'tissues': 75, 'candy': 3}
>>> print purse['candy']
3

>>> purse['candy'] = purse['candy'] + 2
>>> print purse
{'money': 12, 'tissues': 75, 'candy': 5}


Comparing Lists and Dictionaries


• Dictionaries are like lists except that they use keys instead of
numbers to look up values

>>> lst = list()
>>> lst.append(21)
>>> lst.append(183)
>>> print lst
[21, 183]

>>> lst[0] = 23
>>> print lst
[23, 183]

>>> ddd = dict()
>>> ddd['age'] = 21
>>> ddd['course'] = 182
>>> print ddd
{'course': 182, 'age': 21}

>>> ddd['age'] = 23
>>> print ddd
{'course': 182, 'age': 23}


Dictionary Literals (Constants)


• Dictionary literals use curly braces and have a list of key : value pairs
• You can make an empty dictionary using empty curly braces

>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> print jjj
{'jan': 100, 'chuck': 1, 'fred': 42}

>>> ooo = { }
>>> print ooo
{}
>>>


Many Counters with a Dictionary


• One common use of dictionary is counting how often we “see” something

>>> ccc = dict()
>>> ccc['csev'] = 1
>>> ccc['cwen'] = 1
>>> print ccc
{'csev': 1, 'cwen': 1}

>>> ccc['cwen'] = ccc['cwen'] + 1
>>> print ccc
{'csev': 1, 'cwen': 2}


Dictionary Tracebacks


• It is an error to reference a key which is not in the dictionary
• We can use the in operator to see if a key is in the dictionary

>>> ccc = dict()
>>> print ccc['csev']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'csev'

>>> print 'csev' in ccc
False


The get method for dictionaries


• This pattern of checking to see if a key is already in a dictionary and assuming a default value if the key is not there is so common, that there is a method called get() that does this for us

Default value if key does not exist

if name in counts:
    x = counts[name]
else :
    x = 0
x = counts.get(name, 0)

(and no Traceback). {'csev': 2, 'zqian': 1, 'cwen': 2}


Counting Pattern

counts = dict()
print 'Enter a line of text:'
line = raw_input('')
words = line.split()
print 'Words:', words
print 'Counting...'
for word in words:
    counts[word] = counts.get(word,0) + 1
print 'Counts', counts

The general pattern to count the words in a line of text is to split the line into words, then loop through the words and use a dictionary to track the count of each word independently


Definite Loops and Dictionaries


• Even though dictionaries are not stored in order, we can write a for loop that goes through all the entries in a dictionary - actually it goes through all of the keys in the dictionary and looks up the values

>>> counts = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> for key in counts:
... print key, counts[key]
...

jan 100
chuck 1
fred 42
>>>


Retrieving lists of Keys and Values


• You can get a list of keys, values, or items (both) from a dictionary

>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> print list(jjj)
['jan', 'chuck', 'fred']

>>> print jjj.keys()
['jan', 'chuck', 'fred']

>>> print jjj.values()
[100, 1, 42]

>>> print jjj.items()
[('jan', 100), ('chuck', 1), ('fred', 42)]
>>>


Bonus: Two Iteration Variables!


• We loop through the key-value pairs in a dictionary using *two* iteration variables
• Each iteration, the first variable is the key and the second variable is the corresponding value

for the key
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> for aaa,bbb in jjj.items() :
... print aaa, bbb
...
jan 100
chuck 1
fred 42
>>>


items() Vs iteritems()


dict.items(): Return a copy of the dictionary’s list of (key, value) pairs.

dict.iteritems(): Return an iterator over the dictionary’s (key, value) pairs.
If I run the code below, each seems to return a reference to the same object. Are there any subtle differences that I am missing?

#!/usr/bin/python

d={1:'one',2:'two',3:'three'}
print 'd.items():'
for k,v in d.items():
   if d[k] is v: print '\tthey are the same object' 
   else: print '\tthey are different'

print 'd.iteritems():'   
for k,v in d.iteritems():
   if d[k] is v: print '\tthey are the same object' 
   else: print '\tthey are different'   

Output:

d.items():
    they are the same object
    they are the same object
    they are the same object
d.iteritems():
    they are the same object
    they are the same object
    they are the same object


It's part of an evolution.

Originally, Python items() built a real list of tuples and returned that. That could potentially take a lot of extra memory.

Then, generators were introduced to the language in general, and that method was reimplemented as an iterator-generator method named iteritems(). The original remains for backwards compatibility.

One of Python 3’s changes is that  items() now return iterators, and a list is never fully built. The iteritems() method is also gone, since items() in Python 3 works like viewitems() in Python 2.7.


Best Practice


movies = list()

movie1 = dict()
movie1['Title'] = 'Avatar'
movie1['Rating'] = 'PG-13'
movies.append(movie1)

movie2 = dict()
movie2['Title'] = 'Matrix'
movie2['Ratng'] = 'PG-13'
movies.append(movie2)

Suppose the convention is to have keys Title, Rating but Rating is mis-spelled to Ratng

Now what is better way for lookup validation.
We can loop through the keys that are expected to be there.

keys = ['Title', 'Rating']

for item in movies:

    for key in keys:
        print(key + ' : ' + item[key]

The mis-spelling would be taken care in such a case.





Comments