How to use defaultdict?

A common problem that you can face when working with Python dictionaries is to try to access or modify keys that don’t exist in the dictionary. This will raise a KeyError and break up your code execution. To handle these kinds of situations, the standard library provides the Python defaultdict type, a dictionary-like class that’s available for you in collections.

The Python defaultdict type behaves almost exactly like a regular Python dictionary, but if you try to access or modify a missing key, then defaultdict will automatically create the key and generate a default value for it. This makes defaultdict a valuable option for handling missing keys in dictionaries.

Handling Missing Keys in Dictionaries

A common issue that you can face when working with Python dictionaries is how to handle missing keys. If your code is heavily based on dictionaries, or if you’re creating dictionaries on the fly all the time, then you’ll soon notice that dealing with frequent KeyError exceptions can be quite annoying and can add extra complexity to your code. With Python dictionaries, you have at least four available ways to handle missing keys:

  1. Use .setdefault()

  2. Use .get()

  3. Use the key in dict idiom

  4. Use a try and except block

The Python docs explain .setdefault() and .get() as follows:

setdefault(key[, default])

If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

get(key[, default])

Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.

(Source)

Here’s an example of how you can use .setdefault() to handle missing keys in a dictionary:

>>>

>>> a_dict = {}

>>> a_dict['missing_key']

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

a_dict['missing_key']

KeyError: 'missing_key'

>>> a_dict.setdefault('missing_key', 'default value')

'default value'

>>> a_dict['missing_key']

'default value'

>>> a_dict.setdefault('missing_key', 'another default value')

'default value'

>>> a_dict

{'missing_key': 'default value'}


In the above code, you use .setdefault() to generate a default value for missing_key. Notice that the dictionary, a_dict, now has a new key called missing_key whose value is 'default value'. This key didn’t exist before you called .setdefault(). Finally, if we call .setdefault() on an existing key, then the call won’t have any effect on the dictionary. The key will hold the original value instead of the new default value.

On the other hand, if you use .get(), then we can code something like this:

>>>

>>> a_dict = {}

>>> a_dict.get('missing_key', 'default value')

'default value'

>>> a_dict

{}


Here, we use .get() to generate a default value for missing_key, but this time, we dictionary stays empty. This is because .get() returns the default value, but this value isn’t added to the underlying dictionary. We can assume that .get() works something like this:

D.get(key, default) -> D[key] if key in D, else default

We can also use conditional statements to handle missing keys in dictionaries. Take a look at the following example, which uses the key in dict idiom:

>>>

>>> a_dict = {}

>>> if 'key' in a_dict:

... # Do something with 'key'...

... a_dict['key']

... else:

... a_dict['key'] = 'default value'

...

>>> a_dict

{'key': 'default value'}


In this code, you use an if statement along with the in operator to check if key is present in a_dict. If so, then you can perform any action with key or with its value. Otherwise, you create the new key, key, and assign it a 'default value'. Note that the above code works similar to .setdefault() but takes four lines of code, while .setdefault() would only take one line (in addition to being more readable).

You can also walk around the KeyError by using a try and except block to handle the exception. Consider the following piece of code:

>>>

>>> a_dict = {}

>>> try:

... # Do something with 'key'...

... a_dict['key']

... except KeyError:

... a_dict['key'] = 'default value'

...

>>> a_dict

{'key': 'default value'}


The try and except block in the above example catches the KeyError whenever you try to get access to a missing key. In the except clause, you create the key and assign it a 'default value'.

Note: If missing keys are uncommon in your code, then you might prefer to use a try and except block (EAFP coding style) to catch the KeyError exception. This is because the code doesn’t check the existence of every key and only handles a few exceptions, if any.

On the other hand, if missing keys are quite common in your code, then the conditional statement (LBYL coding style) can be a better choice because checking for keys can be less costly than handling frequent exceptions.

So far, you’ve learned how to handle missing keys using the tools that dict and Python offer you. However, the examples you saw here are quite verbose and hard to read. They might not be as straightforward as you might want. That’s why the Python standard library provides a more elegant, Pythonic, and efficient solution. That solution is collections.defaultdict, and that’s what you’ll be covering from now on.

Understanding the Python defaultdict Type

The Python standard library provides collections, which is a module that implements specialized container types. One of those is the Python defaultdict type, which is an alternative to dict that’s specifically designed to help you out with missing keys. defaultdict is a Python type that inherits from dict:

>>>

>>> from collections import defaultdict

>>> issubclass(defaultdict, dict)

True


The above code shows that the Python defaultdict type is a subclass of dict. This means that defaultdict inherits most of the behavior of dict. So, you can say that defaultdict is much like an ordinary dictionary.

The main difference between defaultdict and dict is that when you try to access or modify a key that’s not present in the dictionary, a default value is automatically given to that key. In order to provide this functionality, the Python defaultdict type does two things:

  1. It overrides .__missing__().

  2. It adds .default_factory, a writable instance variable that needs to be provided at the time of instantiation.

The instance variable .default_factory will hold the first argument passed into defaultdict.__init__(). This argument can take a valid Python callable or None. If a callable is provided, then it’ll automatically be called by defaultdict whenever you try to access or modify the value associated with a missing key.

Note: All the remaining arguments to the class initializer are treated as if they were passed to the initializer of regular dict, including the keyword arguments.

Take a look at how you can create and properly initialize a defaultdict:

>>>

>>> # Correct instantiation

>>> def_dict = defaultdict(list) # Pass list to .default_factory

>>> def_dict['one'] = 1 # Add a key-value pair

>>> def_dict['missing'] # Access a missing key returns an empty list

[]

>>> def_dict['another_missing'].append(4) # Modify a missing key

>>> def_dict

defaultdict(<class 'list'>, {'one': 1, 'missing': [], 'another_missing': [4]})


Here, you pass list to .default_factory when you create the dictionary. Then, you use def_dict just like a regular dictionary. Note that when you try to access or modify the value mapped to a non-existent key, the dictionary assigns it the default value that results from calling list().

Keep in mind that you must pass a valid Python callable object to .default_factory, so remember not to call it using the parentheses at initialization time. This can be a common issue when you start using the Python defaultdict type. Take a look at the following code:

>>>

>>> # Wrong instantiation

>>> def_dict = defaultdict(list())

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

def_dict = defaultdict(list())

TypeError: first argument must be callable or None


Here, you try to create a defaultdict by passing list() to .default_factory. The call to list() raises a TypeError, which tells you that the first argument must be callable or None.

With this introduction to the Python defaultdict type, you can get start coding with practical examples. The next few sections will walk you through some common use cases where you can rely on a defaultdict to provide an elegant, efficient, and Pythonic solution.

Using the Python defaultdict Type

Sometimes, you’ll use a mutable built-in collection (a list, dict, or set) as values in your Python dictionaries. In these cases, you’ll need to initialize the keys before first use, or you’ll get a KeyError. You can either do this process manually or automate it using a Python defaultdict. In this section, you’ll learn how to use the Python defaultdict type for solving some common programming problems:

  • Grouping the items in a collection

  • Counting the items in a collection

  • Accumulating the values in a collection

You’ll be covering some examples that use list, set, int, and float to perform grouping, counting, and accumulating operations in a user-friendly and efficient way.

Grouping Items

A typical use of the Python defaultdict type is to set .default_factory to list and then build a dictionary that maps keys to lists of values. With this defaultdict, if you try to get access to any missing key, then the dictionary runs the following steps:

  1. Call list() to create a new empty list

  2. Insert the empty list into the dictionary using the missing key as key

  3. Return a reference to that list

This allows you to write code like this:

>>>

>>> from collections import defaultdict

>>> dd = defaultdict(list)

>>> dd['key'].append(1)

>>> dd

defaultdict(<class 'list'>, {'key': [1]})

>>> dd['key'].append(2)

>>> dd

defaultdict(<class 'list'>, {'key': [1, 2]})

>>> dd['key'].append(3)

>>> dd

defaultdict(<class 'list'>, {'key': [1, 2, 3]})


Here, you create a Python defaultdict called dd and pass list to .default_factory. Notice that even when key isn’t defined, you can append values to it without getting a KeyError. That’s because dd automatically calls .default_factory to generate a default value for the missing key.

You can use defaultdict along with list to group the items in a sequence or a collection. Suppose that you’ve retrieved the following data from your company’s database:

Department

Employee Name

Sales

John Doe

Sales

Martin Smith

Accounting

Jane Doe

Marketing

Elizabeth Smith

Marketing

Adam Doe

With this data, you create an initial list of tuple objects like the following:

dep = [('Sales', 'John Doe'),

('Sales', 'Martin Smith'),

('Accounting', 'Jane Doe'),

('Marketing', 'Elizabeth Smith'),

('Marketing', 'Adam Doe')]


Now, you need to create a dictionary that groups the employees by department. To do this, you can use a defaultdict as follows:

from collections import defaultdict


dep_dd = defaultdict(list)

for department, employee in dep:

dep_dd[department].append(employee)


Here, you create a defaultdict called dep_dd and use a for loop to iterate through your dep list. The statement dep_dd[department].append(employee) creates the keys for the departments, initializes them to an empty list, and then appends the employees to each department. Once you run this code, your dep_dd will look something like this:

>>>

defaultdict(<class 'list'>, {'Sales': ['John Doe', 'Martin Smith'],

'Accounting' : ['Jane Doe'],

'Marketing': ['Elizabeth Smith', 'Adam Doe']})


In this example, you group the employees by their department using a defaultdict with .default_factory set to list. To do this with a regular dictionary, you can use dict.setdefault() as follows:

dep_d = dict()

for department, employee in dep:

dep_d.setdefault(department, []).append(employee)


This code is straightforward, and you’ll find similar code quite often in your work as a Python coder. However, the defaultdict version is arguably more readable, and for large datasets, it can also be a lot faster and more efficient. So, if speed is a concern for you, then you should consider using a defaultdict instead of a standard dict.