Programming‎ > ‎Python‎ > ‎


What is a Collection?

• A collection is nice because we can put more than one value in it and carry them all around in one convenient package
• We have a bunch of values in a single “variable”
• We do this by having more than one place “in” the variable
• We have ways of finding the different places in the variable

A Story of Two Collections..

• List
> A linear collection of values that stay in order

• Dictionary
> A “bag” of values, each with its own label


• Dictionaries are Python’s most powerful data collection
• Dictionaries allow us to do fast database-like operations in Python
• Dictionaries have different names in different languages
> Associative Arrays - Perl / PHP
> Properties or Map or HashMap - Java
> Property Bag - C# / .Net

• Lists index their entries based on the position in the list
• Dictionaries are like bags - no order
• So we index the things we put in the dictionary with a “lookup tag”

>>> purse = dict()
>>> purse['money'] = 12
>>> purse['candy'] = 3
>>> purse['tissues'] = 75
>>> print purse
{'money': 12, 'tissues': 75, 'candy': 3}
>>> print purse['candy']

>>> purse['candy'] = purse['candy'] + 2
>>> print purse
{'money': 12, 'tissues': 75, 'candy': 5}

Comparing Lists and Dictionaries

• Dictionaries are like lists except that they use keys instead of
numbers to look up values

>>> lst = list()
>>> lst.append(21)
>>> lst.append(183)
>>> print lst
[21, 183]

>>> lst[0] = 23
>>> print lst
[23, 183]

>>> ddd = dict()
>>> ddd['age'] = 21
>>> ddd['course'] = 182
>>> print ddd
{'course': 182, 'age': 21}

>>> ddd['age'] = 23
>>> print ddd
{'course': 182, 'age': 23}

Dictionary Literals (Constants)

• Dictionary literals use curly braces and have a list of key : value pairs
• You can make an empty dictionary using empty curly braces

>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> print jjj
{'jan': 100, 'chuck': 1, 'fred': 42}

>>> ooo = { }
>>> print ooo

Many Counters with a Dictionary

• One common use of dictionary is counting how often we “see” something

>>> ccc = dict()
>>> ccc['csev'] = 1
>>> ccc['cwen'] = 1
>>> print ccc
{'csev': 1, 'cwen': 1}

>>> ccc['cwen'] = ccc['cwen'] + 1
>>> print ccc
{'csev': 1, 'cwen': 2}

Dictionary Tracebacks

• It is an error to reference a key which is not in the dictionary
• We can use the in operator to see if a key is in the dictionary

>>> ccc = dict()
>>> print ccc['csev']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'csev'

>>> print 'csev' in ccc

The get method for dictionaries

• This pattern of checking to see if a key is already in a dictionary and assuming a default value if the key is not there is so common, that there is a method called get() that does this for us

Default value if key does not exist

if name in counts:
    x = counts[name]
else :
    x = 0
x = counts.get(name, 0)

(and no Traceback). {'csev': 2, 'zqian': 1, 'cwen': 2}

Counting Pattern

counts = dict()
print 'Enter a line of text:'
line = raw_input('')
words = line.split()
print 'Words:', words
print 'Counting...'
for word in words:
    counts[word] = counts.get(word,0) + 1
print 'Counts', counts

The general pattern to count the words in a line of text is to split the line into words, then loop through the words and use a dictionary to track the count of each word independently

Definite Loops and Dictionaries

• Even though dictionaries are not stored in order, we can write a for loop that goes through all the entries in a dictionary - actually it goes through all of the keys in the dictionary and looks up the values

>>> counts = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> for key in counts:
... print key, counts[key]

jan 100
chuck 1
fred 42

Retrieving lists of Keys and Values

• You can get a list of keys, values, or items (both) from a dictionary

>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> print list(jjj)
['jan', 'chuck', 'fred']

>>> print jjj.keys()
['jan', 'chuck', 'fred']

>>> print jjj.values()
[100, 1, 42]

>>> print jjj.items()
[('jan', 100), ('chuck', 1), ('fred', 42)]

Bonus: Two Iteration Variables!

• We loop through the key-value pairs in a dictionary using *two* iteration variables
• Each iteration, the first variable is the key and the second variable is the corresponding value

for the key
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> for aaa,bbb in jjj.items() :
... print aaa, bbb
jan 100
chuck 1
fred 42

items() Vs iteritems()

dict.items(): Return a copy of the dictionary’s list of (key, value) pairs.

dict.iteritems(): Return an iterator over the dictionary’s (key, value) pairs.
If I run the code below, each seems to return a reference to the same object. Are there any subtle differences that I am missing?


print 'd.items():'
for k,v in d.items():
   if d[k] is v: print '\tthey are the same object' 
   else: print '\tthey are different'

print 'd.iteritems():'   
for k,v in d.iteritems():
   if d[k] is v: print '\tthey are the same object' 
   else: print '\tthey are different'   


    they are the same object
    they are the same object
    they are the same object
    they are the same object
    they are the same object
    they are the same object

It's part of an evolution.

Originally, Python items() built a real list of tuples and returned that. That could potentially take a lot of extra memory.

Then, generators were introduced to the language in general, and that method was reimplemented as an iterator-generator method named iteritems(). The original remains for backwards compatibility.

One of Python 3’s changes is that  items() now return iterators, and a list is never fully built. The iteritems() method is also gone, since items() in Python 3 works like viewitems() in Python 2.7.

Best Practice

movies = list()

movie1 = dict()
movie1['Title'] = 'Avatar'
movie1['Rating'] = 'PG-13'

movie2 = dict()
movie2['Title'] = 'Matrix'
movie2['Ratng'] = 'PG-13'

Suppose the convention is to have keys Title, Rating but Rating is mis-spelled to Ratng

Now what is better way for lookup validation.
We can loop through the keys that are expected to be there.

keys = ['Title', 'Rating']

for item in movies:

    for key in keys:
        print(key + ' : ' + item[key]

The mis-spelling would be taken care in such a case.