1.3.2 Databases

What you need to know

(a) Relational database, flat file, primary key, foreign key, secondary key, entity relationship modelling, normalisation and indexing.

(b) Methods of capturing, selecting, managing and exchanging data.

(d) SQL – Interpret and modify.

(e) Referential integrity.

(f) Transaction processing, ACID (Atomicity, Consistency, Isolation, Durability), record locking and redundancy.

Database structure

A database is organised using a set of key components. These include:

Unit of data: Each individual piece of data entered is a unit of data. These units are also called data elements.

Primary key: contains a unique identifier for each record. To make each record in a database unique we normally assign them a primary key. Even if a record is deleted from a database, the primary key will not be used again. The primary key can be automatically generated and will normally just be a unique number or mix of numbers and letters.

Field: The column headings are called the fields. Each field contains a different attribute. For every entity, a unit of data will be entered into each field. Each column might require different data types. For example, the 'Title' column will require data entered as text and the 'Certificate' column will need data entered as numbers.

Record: Records contain a collection of data for each entity, usually recorded as a row in the table.

Table: The table contains all of the fields and the records for one type of entity. A database may contain more than one table.

This is an example table of a flat-file database. The entities are films and the attributes are details about the films

Flat File Databases

A flat-file database consists of data stored within a plain text file. Records are stored one per line and each attribute of the record is separated by a delimiter – often in the form of a comma or tab.

Benefits of a flat-file database:

Very easy to create and use (for small amounts of data)

Problems with flat file databases:

It is easy to extract information from them, less security
There is a lot of redundant and repeated data in flat file databases
this can lead to further problems with inconsistency and an increased chance of errors in data entry – data integrity
They can be very slow and time consuming to search especially if it is a large database
sharing information from flat-file databases can be complicated and inefficient

A Flat file database as CSV file

A flat file database as table

Relational Database Mode

Relational database work by splitting data about different entities (types of things) into separate relations (tables). Each relation only contains data about one entity and relationships (connections) are made between each of the relations through the use of primary and foreign keys.

Each table still only contains the actual data about the entity itself:

Pupil Table

Teacher Table

Class Table

By separating the data into separate tables there is less duplication, therefore less chance of corruption of data. However, due to the relationship information stored in a data dictionary, you can use a language called Structured Query Language (SQL) to perform powerful and complex queries, combining information from multiple tables.

Among other things SQL allows :

Sorting of data
Searching of data
Filtering of data
Combining data from multiple tables using join queries

Advantages:

Data is only stored once, no repetition of data entry
more efficient
easy to delete and or modify details
avoids inconsistency and problems with data integrity
Better security, by splitting data into tables certain tables can be made confidential

Disadvantages:

More expensive to set up and manage than flat-file database.

Primary key: A field in a table that is used to uniquely identify a record, something that cannot be duplicated. An example would be the registration number of a car

Foreign key: When the primary key field from one table in a relational database is used as an ordinary field in another table in the database to create a relationship between the two tables.

Secondary key: In the event that a primary key is not enough to distinguish an object, a secondary key can be used to render that object unique. It is sorted in relation to a primary key, making search terms clearer so that only desired results appear when a table is queried

Composite key: A composite key is a combination of two or more fields in a table that can be used to uniquely identify each record in the table.

Task

Read the presentation regarding Database Management Systems (DBMS) (CLICK HERE)

Methods of Capturing Data

Data needs to be input into the database and there are multiple methods of doing this.
The chosen method is always dependent on the context. For example, if pedestrians are participating in a survey, their responses will need to be manually entered into the database.
Data is also captured when people pay cheques. Banks scan cheques using Magnetic Ink Character Recognition (MICR). All of the details excluding the amount are printed in a special magnetic ink which can be recognised by a computer but the amount must be entered manually.
Optical Mark Recognition (OMR) is used for multiple-choice questions on a test.

Some other Methods of Capurting Data could be:

1. Voice recognition - E.g. phone-integrated AI such as Siri, Google Now

2. Optical Character Recognition (OCR) - E.g. scanning word documents and converting them to a text format

3. Barcodes and QR codes - E.g. in shops, on promotional posters

Selecting and Managing Data

Selecting the correct data is an important part of data preprocessing. This could involve only selecting data that fits a certain criteria to reduce the volume of input. For example, a camera catching speeding cars will only select cars going above a certain speed

The three types of relationship

One-to-one: Quite rare

E.g. a student can only enrol on one course, and each course can only have one student enrolled (obviously, this is very unlikely in real life) May be used for security reasons, e.g. separating payment details and customer details into two tables

Many-to-one:

E.g. a university student can only enrol on one course, but each course can have multiple students enrolled

Many-to-many:

E.g. a high school student can enrol on multiple courses, and each course can have multiple student enrolled

An example of an entity relationship model would look something like this...

A one-to-one relationship is modelled by a single solid line, shown between entity 1 and entity 2. One-to-many and many-to-many relationships are modelled using three lines such as the one-to-many between entities 2 and 3 and the many-to-many between entities 3 and 4.

A one-to-many relationship means one object from entity 2 can be relational to many objects from entity 3, however one object from entity 3 can not be relational to many objects from entity 2.

A many-to-many relationship means many objects from entity 3 can be relational to many objects from entity 4 and vice versa. Many-to-many relationships cannot be allowed to exist in practice as it is nearly impossible to extract information from them. In order to resolve this issue, a linking table can be created between the two entities with a many-to-many relationship. Both entities (3 and 4) would have a one-to-many relationship with this linking table.

Normalisation

Normalisation is a process used to create the best possible database design so that data is not duplicated. There are 3 stages: 1st Normal Form; 2nd Normal Form; 3rd Normal Form

Further Reading: http://www.gitta.info/LogicModelin/en/html/unit_DataConsiten.html

1st Normal Form

The table must contain no duplicated attributes (attribute = field, datatype, validation, etc.)
It must be atomic, meaning one attribute cannot contain multiple different data entries

2nd Normal Form

The table must first be normalised into 1st normal form, you can’t skip straight to 2nd. It must then contain no partial dependencies
- This can only occur if the primary key is composite

3rd Normal Form

Continues from 2NF
It contains no non-key dependencies so ...“All attributes are dependent on the key, the whole key and nothing but the key”

1st, 2nd and 3rd Normal Form (Database Normalisation)_720P HD.mp4

Task

A nice video that explains about Database Normalisation - watch video and make some notes along with examples.(compare you notes with someone else in the group).

Read the presentation with examples of the 3 different Normalisations (CLICK HERE)

Task

Read through the PDF notes and then complete the exercise (CLICK HERE)

Normalisation Exercise

The following is student enrolment data

Put the above data into 1st, 2nd and then 3rd Normal Form, giving the table attributes, identifying the key attributes at each stage (record structure). On completion, produce the entity relationship diagram for the database.

Structured Query Language (SQL)

Game Opportunity - Read through the SQL section and try the following SQL themed games:

SQL Murder Mystery

SQL Island Adventure Game

SELECT: SELECT statements are used to fetch data from a database. All queries will begin with SELECT

FROM: FROM clause is used to list the tables and any joins required for the SQL statement.

WHERE: WHERE filters the result set to include only data where valid data has been created / condition has been met

LIKE: LIKE is a special operator used with the WHERE clause to search for a specific pattern in a column. For example WHERE name LIKE “A%” finds any values that start with “A”.

AND: AND is an operator that combines two conditions. Both conditions must be true for the data to be included in the result set.

OR: OR is an operator that combines two conditions. If either condition is true the data will be included in the result set.

DELETE: DELETE statements are used to remove records from table.

INSERT: INSERT statements are used to add new rows to tables

WILDCARDS: The % and * characters are wildcards that are used to represent zero, one or multiple characters. For example when using the LIKE operator, % can be used before or after a letter to search for any data value that letter at the start or end e.g. LIKE “%a%” finds data in the form a and LIKE “a%” finds data in the form a

Task

Download and install to either a USB memory stick or your computer either XAMPP of WAMP server
Follow the PHP video shown below to create SQL instructions to connect to a database and insert data.
Once you have a working database can you use some other SQL methods in your PHP file.

Referential Integrity

Referential integrity is a relational database concept meaning table relationships must always be consistent. Any foreign key field must agree with and be the same data type as the primary key it is referencing. Any primary key field changes must therefore be applied to all foreign keys and vice versa.

Task

Read through the Google Slide Presentation (CLICK HERE)

Transaction Processing

A data transaction is defined as a single operation executed on data.

ACID (Atomicity, Consistency, Isolation, Durability)

Atomicity: - A transaction must be processed in its entirety or not at all,

Consistency: - A transaction must maintain the referential integrity rules between linked tables,

Isolation: - Simultaneous executions of transactions should lead to the same result as if they were executed one after the other,

Durability: - Once a transaction has been executed it will remain so regardless of the circumstances surrounding it, such as in the event of a power cut.

Record Locking

The process of preventing simultaneous access to records in a database is called record locking and it is used in order to prevent inconsistencies or a loss of updates. While one person is editing a record, this ‘locks’ the record so prevents others from accessing the same record. There are some problems with this, the biggest of which is deadlock.

The easiest way to explain this is through an example:

User 1 accesses Customer 1’s record and as a result, locks Customer 1’s record.
Simultaneously User 2 accesses Customer 2’s record and as a result, locks Customer 2’s record.
Now User 1 tries to access Customer 2’s record, and User 2 tries to access Customer 1’s record.
User 1 waits for Customer 2’s record to be free and User 2 waits for Customer 1’s record to be free and as they are both waiting, there is no progress causing a deadlock.

Redundancy

Some information is very important and people and companies cannot afford to lose this information. This is where redundancy comes in. Redundancy is the process of having one or more copies of the data in physically different locations. This means that if there is any damage to one copy the others will remain unaffected and can be recovered.

Revision Task

Recapping by reading through the Google Slide Presentation (CLICK HERE)

PAST PAPER QUESTIONS

Try and answer the past paper exam questions -You can write your answers on paper or print out the exam paper - Mark Scheme is provided at the end of the paper questions. (try not to look at the answers before attempting all questions)

1.3.2 Database Exam Questions.pdf

Page updated

Report abuse