Relational Databases: Indexes

Indexes are special data structures set up within a database  to make queries run faster.

Consider the following table:
id	age	last_name	hometown
-- -- -- --
1 10 Johnson San Francisco, CA
2 27 Smith San Joe, CA
3 15 Rose Palo Alto, CA
4 64 Farmer Mill Valley, CA
5 55 Pauling San Francisco, CA
6 17 Smith Oakland, CA
... ... ... ...
100 49 Meyer Berkeley, CA
101 30 Wayne Monterey, CA
102 18 Schwartz San Francisco, CA
104 6 Johnson San Francisco, CA
... ... ... ...
10000 41 Fetterman Mountain View, CA
10001 25 Breyer Redwood City, CA

To find all the records with a particular last name, e.g., Johnson, the database engine would have to compare every record in the table with "Johnson".

Indexes are auxiliary data structures that reorganize the data using the fields that are part of the query. An index for a last_name query would have only the last_name and id fields:

Breyer                10001
...
Fetterman          10000
....
Johnson                  1
Johnson                  104
...

With the data reconfigured, the database can find all the records with a particular name faster. In fact, using a B-Tree data structure, performance can be better than O(log n). This means that if there are 64 records, it can find the data with at most  6 comparisons. Here's a sample B-Tree, organized by some field of type integer:

diagram from http://www.bluerwhite.org/btree/

With App Engine, you may find that some queries will not even work unless you set up indexes for them. App Engine generates simple indexes for you and specifies them in a file called index.yaml. For more complex queries , you may have to explicitly add indexes to the index.yaml file. Here's what the App Engine docs (http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html) say:

App Engine provides automatic indexes for the following forms of queries:

  • queries using only equality and ancestor filters
  • queries using only inequality filters (which can only be of a single property)
  • queries with no filters and only one sort order on a property, either ascending or descending
  • queries using equality filters on properties and inequality or range filters on keys

Other forms of queries require their indexes to be specified in index.yaml, including:

  • queries with multiple sort orders
  • queries with a sort order on keys in descending order
  • queries with one or more inequality filters on a property and one or more equality filters over other properties
  • queries with inequality filters and ancestor filters
App Engine also provides the following tip:

Tip: If during testing the app exercises every query it will make using the development web server, then the generated entries in index.yaml will be complete. You only need to edit the file manually to delete indexes that are no longer used, or to define indexes not created by the development web server.

If you do to set up an index manually, you can modify the index.yaml file. For instance,  consider the following query:

SELECT * FROM Person WHERE last_name = "Smith" AND height < 72 ORDER BY height DESC

It has a one equality filter and one inequality filter, and it has a desc ordering. The index for it, in the index.yaml file, would be defined as:
indexes:
- kind: Person
properties:
- name: last_name
- name: height
direction: desc






Recent site activity