Lecture 25

Today:

For Next Time You Should

NoSQL

NoSQL stands for "Not-only" SQL.  As such, it can be understood as a reaction against the ubiquity of relational databases (and those that use SQL more specifically).  There are many flavors of NoSQL databases including:

The emergence of NoSQL as a phenomenon is best understood within the context of changing requirements faced by modern databases along a number of axes.

Horizontal versus vertical scaling:

Vertical scaling means adding more resources to a single node.  Users have traditionally scaled up the capacity of relational databases through horizontal scaling.  As the requirements for amount of data and performance increase, the cost of continuously scaling vertically become prohibitive.  Additionally, this type of scaling is less agile.

Horizontal scaling means adding more nodes.  One of the most common advantages of NoSQL databases over relational is that they are capable of taking advantage of horizontal scaling.  This leads to the ability to do really cool things like creating really large distributed databases using cloud computing services such as Amazon's EC2.

Fixed versus dynamic database schemas:

A database schema is a formal description of the database structure.  In relational databases this typically must be specified in advance.  For instance, when creating a table in SQL (as we did last time), you would use something like:

CREATE TABLE Writers(Id INT PRIMARY KEY AUTO_INCREMENT, Name VARCHAR(25))

Notice that the column names and their types have to be specified before any data can be stored.  This is not typically the case for NoSQL databases.  The increased need for this type of flexible database structure has been attributed to increased adoption of agile development methods.

Built-in Caching and Replication:

There are lots of add-on layers to relational databases that provide features such as caching of frequently accessed data as well as replicating data to ensure high-availability.  These features are typically built in from the ground up with NoSQL databases.

What do we have to give up?

ACID (Atomicity, Consistency, Isolation, and Durability).  This has been the gold standard for databases for a long time, but due to the importance of the previously mentioned factors for some applications it is no longer considered a requirement.

The space of NoSQL databases is vast.  Check out this chart from Big Data News, Views, and Reviews.

While there is no one logical NoSQL database to show in class, I have decided to choose MongoDB because it is very flexible and powerful.

MongoDB Tutorial

I have put together a mongodb ipython notebook.  It is in the repo from last time (https://github.com/paulruvolo/DatabaseTutorials.git).