8.3 DDL/DML

Specification

  • Show understanding that DBMS carries out all creation/modification of the database structure using its Data Definition Language (DDL)

  • Show understanding that the DBMS carries out all queries and maintenance of data using its DML

  • Show understanding that the industry standard for both DDL and DML is Structured Query Language (SQL)

    • Understand a given SQL script

  • Understand given SQL (DDL) commands and be able to write simple SQL (DDL) commands using a sub-set of commands

    • Create a database (CREATE DATABASE)

    • Create a table definition (CREATE TABLE), including the creation of attributes with appropriate data types:

      • CHARACTER

      • VARCHAR(n)

      • BOOLEAN

      • INTEGER

      • REAL

      • DATE

      • TIME

    • Change a table definition (ALTER TABLE)

    • Add a primary key to a table (PRIMARY KEY (field))

    • Add a foreign key to a table (FOREIGN KEY (field) REFERENCES Table (Field))

  • Write an SQL script to query or modify data (DML) which are stored in (at most two) database tables

    • Queries including SELECT...FROM, WHERE, ORDER BY, GROUP BY, INNER JOIN, SUM, COUNT, AVG

    • Data maintenance including INSERT INTO, DELETE FROM, UPDATE

Data Definition Language and Data Manipulation Language

SQL is a language written to enable manipulation of databases using a scripting language. In terms of programming paradigms, it is a declarative language (like Prolog, HTML) in that it specifies what you want but not how. This is often the most challenging part of the exam, so ensure you complete the worksheet to gain experience with writing SQL statements.

DDL & DML

Introduction

Broadly speaking, there are two main functioning parts to a DBMS. These are the DDL and the DML.

A very useful website to use for looking up SQL commands is the W3Schools site. I have also created a SQL cheat sheet, which you can download at the bottom.

Finally, there are a number of videos which go through many of the SQL concepts you need to know, in video form. I suggest you read the text below, as this relates directly to the CIE course, rather than the videos which also cover exam boards such as OCR. While much of the content is the same, there may be some differences. Disclaimer: The videos at the bottom (under Video Instructions) are from a trusted content creating teacher, but I have not watched them for accuracy.

Data Definition Language (DDL)

The Data Definition Language (DDL) is one part of the DBMS. It is used by the designer of a database rather than a user. The DDL is used to set up the structure of a database as well as for maintaining the database. For example, using the DDL, they can set up tables of data about data (names of attributes, data types, validation rules etc). The data about data is often called ‘meta-data’. The DDL is also used to set up the security systems in the database. This is done using the DDL by describing the whole database (the ‘schema’) and then describing data in parts of the database (the ‘sub-schema’). Once this has been done, the designer can control access to each of the sub-schema. For example, if a receptionist needs to know each employee’s department, extension and job title, then a sub-schema will be defined with these data items in it. The receptionist is then given access to that sub-schema (she is allowed to view that data on her screen) but not other sub-schema. That means she cannot, for example, access how much an employee earns, or the contents of their personnel file. Using the DDL, then, different users of a database can be allowed to access certain sub-schema but not others. Users can be given different ‘views’ of the same data in a database.

Data Manipulation Language (DML)

The DML is another important part of the DBMS. It is commonly used by the users of a database to access and update data. This language allows users to actually work on the data in your database. They could, for example, write programs that add, delete, modify or retrieve data! The DML is a high level language. This ensures that minimal programming skills are necessary to actually use it.

Structured Query Language (SQL)

The industry standard for DDL and DML is called Structured Query Language, or SQL. We will get some experience using SQL in another section.

Structured Query Language (SQL) - getting started

Introduction

SQL, or Structured Query Language, is a language that is used for gaining access to and manipulating tables of data. It is a ‘standard’ language in that it has been defined by the American National Standards Institute (ANSI) and so will work with many different applications such as Access, Informix and Oracle, for example. Although there are a number of different SQL versions in existence, they all support the basic ANSI standard and so all can carry out certain functions in the same way, such as SELECT and DELETE. We will see examples of these later.

How to get experience of SQL

There are some online simulators, where you can enter commands in SQL and then see the results. You could try the one here: http://www.w3schools.com/sql/ There are some very good notes on this website as well as others (use Google), and of course some excellent tutorials on YouTube.

A better way to explore would be to open up Access and use that. You need to open up the Queries section and switch into SQL. You should write and save one SQL job, and then run it. for example, write an SQL query to create a table, save it and then run it. When you run any other queries after creating a table, make sure any tables you are using are closed first, or you may get errors. Using SQL in Access does depend very much on what version of Access you are using, but your teacher will help you get started. Once you have written the first view SQL commands and run them, you will be able to experiment with lots of different instructions. The hardest part is just getting going!

NOTE: MS ACCESS uses YESNO for Boolean values. SQL DOES NOT use this data type. When creating SQL tables (in the exam), you need to use BIT for a Boolean value. 0 as FALSE, 1 as TRUE.

A dog club’s database

Here is a design for a dog club.

We must describe any one E-R relationship with two sentences. In this case, we have:

      • Each owner can own many dogs.

      • Each dog can be owned by only one owner.

Here are the two tables with some record in:

Using Access to do the examples

You must get some actual experience using SQL to really understand it. Use Access or any other relational database package. To try out the following examples in Microsoft Access, do the following.

      • In the database window (The database window is the screen you get when you first open your database, where you can see a window with a list on the left that says tables, queries, forms reports, pages etc.) click on Create and then Queries Design, and then don’t ADD any tables. Just CLOSE the pop-up box.

      • Now click the SQL design view. It should be the first icon on the ribbon bar or, <2007, under the FILE menu.

      • An SQL box should pop-up and you are ready to type in your queries. If you get stuck getting started, you will need to ask your peers or your teacher for help or refer to the HELP menu in Access or use the Internet.

IMPORTANT: When you do each query, you must be sure that you get the correct results. Just because you get some results does not mean they are correct. Whilst you are learning how to construct queries, you should always:

      • Predict the results of running a query before you actually run it.

      • Run the query.

      • Compare your prediction to the actual results obtained.

Important notes

If you are using Access to run SQL, be aware that this DOES NOT support the official SQL data types (ones the exam is looking for). For example, rather than variable character (VARCHAR) it uses TEXT. It also treats Boolean as YES/NO, rather than TRUE/FALSE.

Creating a database

You must create a database in SQL before you can start adding tables and then data to it. The syntax for creating a database is:

CREATE DATABASE database_name

To create a database for the Dog Club, we would use:

CREATE DATABASE dog_club

Creating a table

Once the database has been created, you can add tables to it. You can create a table in SQL using the CREATE TABLE command. The syntax for this command is as follows:

CREATE TABLE tableName

(

column1_name data_type(size),

column2_name data_type(size),

column3_name data_type(size)

)

To create the Dogs table, for example, we would type in the following:

CREATE TABLE tblOwners

(

ID int(1),

Title varchar(5),

Surname varchar(20),

Phone_no varchar(20),

Registration date

)

ID will hold an integer, Title, Surname and Phone_no will hold text and Registration will hold a date. We now have an empty table for the owners of dogs that looks like this:

Every table must have a primary key. This is the one field that is unique for each row in the table, for each record. It is typically an ID number or similar. A primary key is an example of a constraint. You can name constraints (by specifying the name) or allow the DBMS to automatically name it by omitting a name. More on constraints below. We can add the primary key when we create a table, like this:

CREATE TABLE tblOwners

(

ID int(1) NOT NULL,

Title varchar(5),

Surname varchar(20),

Phone_no varchar(20),

Registration date,

PRIMARY KEY (ID)

)

We have added a NOT NULL to our primary key. ALWAYS DO THIS, as a we are specifying that a user MUST type something into the field. Being a primary key, it will automatically be unique. There is a reference further down on constraints. If the key is composite, add NOT NULL to all fields.

We can also add a primary key after a table has been created, by designing an SQL query and running it with these commands. If the PK field is already set to NOT NULL, this part is redundant.

ALTER TABLE tblOwners

ADD PRIMARY KEY (ID) NOT NULL

To add the other table in our database, we would use:

CREATE TABLE tblOwners

(

ID int(1) NOT NULL,

Name varchar(20),

DofB date,

Type varchar(20),

Wins varchar(20),

Owner_ID int,

PRIMARY KEY (ID)

);

Constraints & Relationships

Constraints also can be used to restrict the allowable values for a field. You can restrict values to NOT NULL or UNIQUE, or you can define a check constraint, which is a type of business rule that can be applied to a field. Assume that you want to restrict (or constrain) the values of the first name and last name fields to be unique, meaning that there should never be a combination of first name and last name that is the same for any two records in the table. Because this is a multi-field constraint, it is declared at the table level, not the field level. Use the ADD CONSTRAINT clause and define a multi-field list.

ALTER TABLE tblCustomers ADD CONSTRAINT CustomerID NOT NULL ([Last Name], [First Name])

Relationships, between tables, can be added (if not created with the table) usingthe ALTER TABLE DML command. The example below shows how to add a relationship between two tables. Notice how the table being modified is the table containing the foreign key, referencing the parent table.

ALTER TABLE Orders

ADD FOREIGN KEY (Person_Id)

REFERENCES Customer(Person_Id)

The relationship is just a rule to prevent inconsistent data. That is a rule to state that when you want to link one record to a parent record, that data must be consistent and if the parent data is deleted, some action should be taken to prevent orphaned data,

ALTER TABLE to Drop & Add

The ALTER TABLE command is useful for modifying the table. For example, if you wanted to delete the Surname field in the Owners' table, you would use:

ALTER TABLE tblOwners

DROP COLUMN Surname

If you wanted to add a new column called Surname in the Owners' table (that didn't exist before, perhaps because you just deleted it), you would use:

ALTER TABLE tblOwners

ADD Surname varchar(20)

Structured Query Language (SQL) - adding, removing and amending data

Adding, deleting and updating data in a table

The INSERT INTO command is used to add data into a table. Here is an example of adding a new record to the Owners' table:

INSERT INTO tblOWNERS (ID, TITLE, Surname, Phone_no, Registration)

VALUES (5, "Mr", "Smith", "091123456", '2000-12-01')

Notice the use of quotation marks and the format that of the date.

The DELETE FROM command is used to delete a record (a row) from a table. Here is an example of removing a record from the Owners' table:

DELETE FROM tblOWNERS

WHERE Surname = "Smith"

Notice the use of quotation marks and the format that of the date.

We can update individual pieces of information in a field using the UPDATE command. Here is an example:

UPDATE tblOwners

SET Phone_no="021888888"

WHERE Surname="Smith"

Structured Query Language (SQL) - selecting data

Using SELECT - FROM

Write an SQL query using the following commands:

SELECT Name

FROM tblDogs;

Run the query (in Access, by pressing the exclamation mark) and you should see a list of the names of the dogs. To go back to SQL view, simply right-click on the results of the query and then select SQL view.

The SELECT command is used to select columns to display in a table. The FROM command is used to pick the table(s) you want to get the columns from. You can select more than one column from a table and more than one table. Try running this:

SELECT Name, Type

FROM tblDogs;

You should see a list of names of dogs and their types. Try this:

SELECT Name, Type, [Date of Birth]

FROM tblDogs;

You should get a list of names of dogs, their type and their data of birth. Note the use of square brackets, used because the field name has spaces in it. You can display all of the fields in any particular table easily by using the wildcard symbol, *. Try this:

SELECT *

FROM tblDogs;

You will see all of the fields in all of the records in the dog’s table displayed. You can also display fields from more than one table in the same query. Try this:

SELECT tblDogs.Name, tblDogs.Type, tblOwners.Title, tblOwners.Surname

FROM tblOwners, tblDogs;

Notice the syntax in the SELECT statement. For example tblOwners.Surname means ‘go to the table called tblOwners and get the column called Surname’.

JOINS

When you want to search for data across multiple tables, you need to identify how each table should be linked to each other. You may wonder why you need to do this when you have already created PK/FK relationships? The relationships are mere rules to prevent incosnsitent data being created, enforcing the referential integrity of the database. Some database front end programs may automatically link queries for you by examining the relationships, but the truth is, you can be flexible in how you link tables together.

There are different types of join in SQL, but the specification ONLY specifies the inner join. You can read a lot more about joins here. There is also a useful video below.

The join is what allows you to pull all order records for a given customer, by linking the hypothetical field CustomerID in the table Customers with CustomerID in the table Orders. Therefore, you can specify that you want their name and address (from Customers) along with the item purchased, quantity where the CustomerID matches CustomerID in the two tables.

INNER JOIN

You can experiment in Access easily by designing queries in QBE (Query By Example). QBE is the graphical way of designing queries in Access. You can check that they work and then switch to SQL view to the SQL code (use the ‘design view’ icon under the FILE menu). You can also write queries in SQL and then see the equivalent in QBE. For example, if you design a query in QBE to select the names of dogs and their types and who owns them. When you convert this to SQL, you will get the following:

SELECT tblDogs.Name, tblDogs.Type, tblOwners.Title, tblOwners.Surname

FROM tblOwners INNER JOIN tblDogs ON tblOwners.[Owner ID] = tblDogs.[Owner ID];

This SQL code is different from the code we successfully used in the previous example. The relationship has been defined in the SQL code. It is important to get into the habit of defining the links between tables. Take note of this example and use it in any future SQL queries where you need data from more than one table. If you don’t, you may get strange results!

Note: You can select all fields in a given table by using [table name].*. E.g. customers.*

Note2: For inner joins only, the table order does not matter.

The detail below on different joins is for information purposes only, you will not be expected to use different types of join in any SQL you are asked to write. Only the inner join is part of the specification.

INNER JOIN gets all records from one table that have some related entry in a second table - This is the only join you are expected to know and use in the specification!

LEFT JOIN gets all records from the LEFT linked table but if you have selected some columns from the RIGHT table, if there is no related records, these columns will contain NULL

RIGHT JOIN is like the above but gets all records in the RIGHT table

FULL JOIN gets all records from both tables and puts NULL in the columns where related records do not exist in the opposite table

The SQL below shows an alternative way of performing joins.

SELECT tblDogs.Name, tblDogs.Type, tblOwners.Title, tblOwners.Surname

FROM tblOwners, tblDogs WHERE tblOwners.[Owner ID] = tblDogs.[Owner ID];

Using SELECT - FROM - WHERE

You can search your tables according to some criteria specified in the WHERE statement. There are lots of criteria you can use.

Note that dates in Access must be surround by hash symbols. NULL does not mean zero. It means ‘no data’. There is a big difference! There are other criteria! Try out the following SQL queries. Remember to predict the results before you run the query, then run the query and then compare the prediction to the results.

SELECT Name, Type, [Date of Birth]

FROM tblDogs

WHERE Type="Poodle" OR Type="Spaniel"

SELECT Name, Type, [Date of Birth]

FROM tblDogs

WHERE [Competition wins] IS NULL;

SELECT tblDogs.Name, tblDogs.Type, tblOwners.Title, tblOwners.Surname

FROM tblOwners INNER JOIN tblDogs ON tblOwners.[Owner ID] = tblDogs.[Owner ID]

WHERE Type="Poodle" OR Type="Spaniel";

Don’t forget that if you are getting data from more than one table then you should link them properly in the FROM statement.

Using SELECT - FROM - WHERE - ORDER BY

You can sort the results easily, either in ascending or descending order by any field. For example:

SELECT tblDogs.Name, tblDogs.Type, tblOwners.Title, tblOwners.Surname

FROM tblOwners INNER JOIN tblDogs ON tblOwners.[Owner ID] = tblDogs.[Owner ID]

WHERE Type="Poodle" OR Type="Spaniel"

ORDER BY tblOwners.Surname;

This will order the results in ascending order by the owner’s surname. The reverse order can be obtained like this:

ORDER BY tblOwners.Surname DESC;

Using SELECT - SUM - AS - FROM - GROUP BY

This command can be used to find the totals for each unique entry in columns in tables. For example, suppose you wanted to know how many wins each unique owner in the dog club had in total. You would do it like this:

SELECT tblDogs.[Owner ID], Sum(tblDogs.[Competition wins]) AS [Total number of wins]

FROM tblDogs

GROUP BY tblDogs.[Owner ID];

This SQL command adds up the competition wins for each owner in the tblDogs table. It then displays the total number of wins for each owner under the heading ‘Total number of wins’. To make the most of aggregate function (e.g. count, sum, average) you need to use GROUP BY.

On the surface both clauses appear to do the same thing; that is sort sort data. But this is where their similarities end. In fact, both serve entirely different purposes.

  • The ORDER BY clause’s purpose is to sort the query result by specific columns.

  • The GROUP BY clause’s purpose is summarise unique combinations of columns values. See more here.

Another way to think about GROUP BY is that as you add columns (order is not important, unlike ORDER BY), you are asking for unique record combinations of those respective columns. The fewer the columns, the fewer unique combinations you'll get. E.g. Above we grouped by owner, which will group and summarise based on only each owner. However, if we also stored the city hosting each competition, we could group both fields and get unique values of wins for each city.

This Quora page has further information on the GROUP BY command, as it is almost essential when using any aggregate functions (COUNT, MAX, AVG, SUM, etc).

The video below demonstrates the use and difference between ORDER BY and GROUP BY (which can of course be used together).

More details on aggregate functions (COUNT, SUM, AVERAGE)

Video Instruction

If you prefer to learn SQL the visual way, these videos may help. As stated at the top, I have not watched them to gauge their specific links to the CIE specification.