09-Dictionaries
What is a Collection?
• A collection is nice because we can put more than one value in it and carry them all around in one convenient package
• We have a bunch of values in a single “variable”
• We do this by having more than one place “in” the variable
• We have ways of finding the different places in the variable
A Story of Two Collections..
• List
> A linear collection of values that stay in order
• Dictionary
> A “bag” of values, each with its own label
Dictionaries
• Dictionaries are Python’s most powerful data collection
• Dictionaries allow us to do fast database-like operations in Python
• Dictionaries have different names in different languages
> Associative Arrays - Perl / PHP
> Properties or Map or HashMap - Java
> Property Bag - C# / .Net
• Lists index their entries based on the position in the list
• Dictionaries are like bags - no order
• So we index the things we put in the dictionary with a “lookup tag”
>>> purse = dict()
>>> purse['money'] = 12
>>> purse['candy'] = 3
>>> purse['tissues'] = 75
>>> print purse
{'money': 12, 'tissues': 75, 'candy': 3}
>>> print purse['candy']
3
>>> purse['candy'] = purse['candy'] + 2
>>> print purse
{'money': 12, 'tissues': 75, 'candy': 5}
Comparing Lists and Dictionaries
• Dictionaries are like lists except that they use keys instead of
numbers to look up values
>>> lst = list()
>>> lst.append(21)
>>> lst.append(183)
>>> print lst
[21, 183]
>>> lst[0] = 23
>>> print lst
[23, 183]
>>> ddd = dict()
>>> ddd['age'] = 21
>>> ddd['course'] = 182
>>> print ddd
{'course': 182, 'age': 21}
>>> ddd['age'] = 23
>>> print ddd
{'course': 182, 'age': 23}
Dictionary Literals (Constants)
• Dictionary literals use curly braces and have a list of key : value pairs
• You can make an empty dictionary using empty curly braces
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> print jjj
{'jan': 100, 'chuck': 1, 'fred': 42}
>>> ooo = { }
>>> print ooo
{}
>>>
Many Counters with a Dictionary
• One common use of dictionary is counting how often we “see” something
>>> ccc = dict()
>>> ccc['csev'] = 1
>>> ccc['cwen'] = 1
>>> print ccc
{'csev': 1, 'cwen': 1}
>>> ccc['cwen'] = ccc['cwen'] + 1
>>> print ccc
{'csev': 1, 'cwen': 2}
Dictionary Tracebacks
• It is an error to reference a key which is not in the dictionary
• We can use the in operator to see if a key is in the dictionary
>>> ccc = dict()
>>> print ccc['csev']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'csev'
>>> print 'csev' in ccc
False
The get method for dictionaries
• This pattern of checking to see if a key is already in a dictionary and assuming a default value if the key is not there is so common, that there is a method called get() that does this for us
Default value if key does not exist
if name in counts:
x = counts[name]
else :
x = 0
x = counts.get(name, 0)
(and no Traceback). {'csev': 2, 'zqian': 1, 'cwen': 2}
Counting Pattern
counts = dict()
print 'Enter a line of text:'
line = raw_input('')
words = line.split()
print 'Words:', words
print 'Counting...'
for word in words:
counts[word] = counts.get(word,0) + 1
print 'Counts', counts
The general pattern to count the words in a line of text is to split the line into words, then loop through the words and use a dictionary to track the count of each word independently
Definite Loops and Dictionaries
• Even though dictionaries are not stored in order, we can write a for loop that goes through all the entries in a dictionary - actually it goes through all of the keys in the dictionary and looks up the values
>>> counts = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> for key in counts:
... print key, counts[key]
...
jan 100
chuck 1
fred 42
>>>
Retrieving lists of Keys and Values
• You can get a list of keys, values, or items (both) from a dictionary
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> print list(jjj)
['jan', 'chuck', 'fred']
>>> print jjj.keys()
['jan', 'chuck', 'fred']
>>> print jjj.values()
[100, 1, 42]
>>> print jjj.items()
[('jan', 100), ('chuck', 1), ('fred', 42)]
>>>
Bonus: Two Iteration Variables!
• We loop through the key-value pairs in a dictionary using *two* iteration variables
• Each iteration, the first variable is the key and the second variable is the corresponding value
for the key
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> for aaa,bbb in jjj.items() :
... print aaa, bbb
...
jan 100
chuck 1
fred 42
>>>
items() Vs iteritems()
dict.items(): Return a copy of the dictionary’s list of (key, value) pairs.
dict.iteritems(): Return an iterator over the dictionary’s (key, value) pairs.
If I run the code below, each seems to return a reference to the same object. Are there any subtle differences that I am missing?
#!/usr/bin/python
d={1:'one',2:'two',3:'three'}
print 'd.items():'
for k,v in d.items():
if d[k] is v: print '\tthey are the same object'
else: print '\tthey are different'
print 'd.iteritems():'
for k,v in d.iteritems():
if d[k] is v: print '\tthey are the same object'
else: print '\tthey are different'
Output:
d.items():
they are the same object
they are the same object
they are the same object
d.iteritems():
they are the same object
they are the same object
they are the same object
It's part of an evolution.
Originally, Python items() built a real list of tuples and returned that. That could potentially take a lot of extra memory.
Then, generators were introduced to the language in general, and that method was reimplemented as an iterator-generator method named iteritems(). The original remains for backwards compatibility.
One of Python 3’s changes is that items() now return iterators, and a list is never fully built. The iteritems() method is also gone, since items() in Python 3 works like viewitems() in Python 2.7.
Best Practice
movies = list()
movie1 = dict()
movie1['Title'] = 'Avatar'
movie1['Rating'] = 'PG-13'
movies.append(movie1)
movie2 = dict()
movie2['Title'] = 'Matrix'
movie2['Ratng'] = 'PG-13'
movies.append(movie2)
Suppose the convention is to have keys Title, Rating but Rating is mis-spelled to Ratng
Now what is better way for lookup validation.
We can loop through the keys that are expected to be there.
keys = ['Title', 'Rating']
for item in movies:
for key in keys:
print(key + ' : ' + item[key]
The mis-spelling would be taken care in such a case.