(a) Relational database, flat file, primary key, foreign key, secondary key, entity relationship modelling, normalisation and indexing.
(b) Methods of capturing, selecting, managing and exchanging data.
(c) Normalisation to 3NF.
(d) SQL – Interpret and modify.
(e) Referential integrity.
(f) Transaction processing, ACID (Atomicity, Consistency, Isolation, Durability), record locking and redundancy.
A database is organised using a set of key components. These include:
Unit of data: Each individual piece of data entered is a unit of data. These units are also called data elements.
Primary key: contains a unique identifier for each record. To make each record in a database unique we normally assign them a primary key. Even if a record is deleted from a database, the primary key will not be used again. The primary key can be automatically generated and will normally just be a unique number or mix of numbers and letters.
Field: The column headings are called the fields. Each field contains a different attribute. For every entity, a unit of data will be entered into each field. Each column might require different data types. For example, the 'Title' column will require data entered as text and the 'Certificate' column will need data entered as numbers.
Record: Records contain a collection of data for each entity, usually recorded as a row in the table.
Table: The table contains all of the fields and the records for one type of entity. A database may contain more than one table.
This is an example table of a flat-file database. The entities are films and the attributes are details about the films
A flat-file database consists of data stored within a plain text file. Records are stored one per line and each attribute of the record is separated by a delimiter – often in the form of a comma or tab.
Very easy to create and use (for small amounts of data)
It is easy to extract information from them, less security
There is a lot of redundant and repeated data in flat file databases
this can lead to further problems with inconsistency and an increased chance of errors in data entry – data integrity
They can be very slow and time consuming to search especially if it is a large database
sharing information from flat-file databases can be complicated and inefficient
A Flat file database as CSV file
A flat file database as table
Relational Database Mode
Relational database work by splitting data about different entities (types of things) into separate relations (tables). Each relation only contains data about one entity and relationships (connections) are made between each of the relations through the use of primary and foreign keys.
Each table still only contains the actual data about the entity itself:
Pupil Table
Teacher Table
Class Table
By separating the data into separate tables there is less duplication, therefore less chance of corruption of data. However, due to the relationship information stored in a data dictionary, you can use a language called Structured Query Language (SQL) to perform powerful and complex queries, combining information from multiple tables.
Among other things SQL allows :
Sorting of data
Searching of data
Filtering of data
Combining data from multiple tables using join queries
Advantages:
Data is only stored once, no repetition of data entry
more efficient
easy to delete and or modify details
avoids inconsistency and problems with data integrity
Better security, by splitting data into tables certain tables can be made confidential
Disadvantages:
More expensive to set up and manage than flat-file database.
Primary key: A field in a table that is used to uniquely identify a record, something that cannot be duplicated. An example would be the registration number of a car
Foreign key: When the primary key field from one table in a relational database is used as an ordinary field in another table in the database to create a relationship between the two tables.
Secondary key: In the event that a primary key is not enough to distinguish an object, a secondary key can be used to render that object unique. It is sorted in relation to a primary key, making search terms clearer so that only desired results appear when a table is queried
Task
Read the presentation regarding Database Management Systems (DBMS) (CLICK HERE)
Methods of Capturing Data
Data needs to be input into the database and there are multiple methods of doing this.
The chosen method is always dependent on the context. For example, if pedestrians are participating in a survey, their responses will need to be manually entered into the database.
Data is also captured when people pay cheques. Banks scan cheques using Magnetic Ink Character Recognition (MICR). All of the details excluding the amount are printed in a special magnetic ink which can be recognised by a computer but the amount must be entered manually.
Optical Mark Recognition (OMR) is used for multiple-choice questions on a test.
Some other Methods of Capurting Data could be:
1. Voice recognition - E.g. phone-integrated AI such as Siri, Google Now
2. Optical Character Recognition (OCR) - E.g. scanning word documents and converting them to a text format
3. Barcodes and QR codes - E.g. in shops, on promotional posters
Selecting and Managing Data
Selecting the correct data is an important part of data preprocessing. This could involve only selecting data that fits a certain criteria to reduce the volume of input. For example, a camera catching speeding cars will only select cars going above a certain speed
The three types of relationship
One-to-one: Quite rare
E.g. a student can only enrol on one course, and each course can only have one student enrolled (obviously, this is very unlikely in real life) May be used for security reasons, e.g. separating payment details and customer details into two tables
Many-to-one:
E.g. a university student can only enrol on one course, but each course can have multiple students enrolled
Many-to-many:
E.g. a high school student can enrol on multiple courses, and each course can have multiple student enrolled
An example of an entity relationship model would look something like this...
A one-to-one relationship is modelled by a single solid line, shown between entity 1 and entity 2. One-to-many and many-to-many relationships are modelled using three lines such as the one-to-many between entities 2 and 3 and the many-to-many between entities 3 and 4.
A one-to-many relationship means one object from entity 2 can be relational to many objects from entity 3, however one object from entity 3 can not be relational to many objects from entity 2.
A many-to-many relationship means many objects from entity 3 can be relational to many objects from entity 4 and vice versa. Many-to-many relationships cannot be allowed to exist in practice as it is nearly impossible to extract information from them. In order to resolve this issue, a linking table can be created between the two entities with a many-to-many relationship. Both entities (3 and 4) would have a one-to-many relationship with this linking table.
Normalisation
Normalisation is a process used to create the best possible database design so that data is not duplicated. There are 3 stages: 1st Normal Form; 2nd Normal Form; 3rd Normal Form
Further Reading: http://www.gitta.info/LogicModelin/en/html/unit_DataConsiten.html
The table must contain no duplicated attributes (attribute = field, datatype, validation, etc.)
It must be atomic, meaning one attribute cannot contain multiple different data entries
The table must first be normalised into 1st normal form, you can’t skip straight to 2nd. It must then contain no partial dependencies
This can only occur if the primary key is composite
Continues from 2NF
It contains no non-key dependencies so ...“All attributes are dependent on the key, the whole key and nothing but the key”
Task
A nice video that explains about Database Normalisation - watch video and make some notes along with examples.(compare you notes with someone else in the group).
Read the presentation with examples of the 3 different Normalisations (CLICK HERE)
Task
Read through the PDF notes and then complete the exercise (CLICK HERE)
The following is student enrolment data
Put the above data into 1st, 2nd and then 3rd Normal Form, giving the table attributes, identifying the key attributes at each stage (record structure). On completion, produce the entity relationship diagram for the database.
Structured Query Language (SQL)
Game Opportunity - Read through the SQL section and try the following SQL themed games:
SELECT: SELECT statements are used to fetch data from a database. All queries will begin with SELECT
FROM: FROM clause is used to list the tables and any joins required for the SQL statement.
WHERE: WHERE filters the result set to include only data where valid data has been created / condition has been met
Referential integrity is a relational database concept meaning table relationships must always be consistent. Any foreign key field must agree with and be the same data type as the primary key it is referencing. Any primary key field changes must therefore be applied to all foreign keys and vice versa.
Task
Read through the Google Slide Presentation (CLICK HERE)
Transaction Processing
A data transaction is defined as a single operation executed on data.
ACID (Atomicity, Consistency, Isolation, Durability)
Atomicity: - A transaction must be processed in its entirety or not at all,
Consistency: - A transaction must maintain the referential integrity rules between linked tables,
Isolation: - Simultaneous executions of transactions should lead to the same result as if they were executed one after the other,
Durability: - Once a transaction has been executed it will remain so regardless of the circumstances surrounding it, such as in the event of a power cut.
Record Locking
The process of preventing simultaneous access to records in a database is called record locking and it is used in order to prevent inconsistencies or a loss of updates. While one person is editing a record, this ‘locks’ the record so prevents others from accessing the same record. There are some problems with this, the biggest of which is deadlock.
The easiest way to explain this is through an example:
User 1 accesses Customer 1’s record and as a result, locks Customer 1’s record.
Simultaneously User 2 accesses Customer 2’s record and as a result, locks Customer 2’s record.
Now User 1 tries to access Customer 2’s record, and User 2 tries to access Customer 1’s record.
User 1 waits for Customer 2’s record to be free and User 2 waits for Customer 1’s record to be free and as they are both waiting, there is no progress causing a deadlock.
Redundancy
Some information is very important and people and companies cannot afford to lose this information. This is where redundancy comes in. Redundancy is the process of having one or more copies of the data in physically different locations. This means that if there is any damage to one copy the others will remain unaffected and can be recovered.
Revision Task
Recapping by reading through the Google Slide Presentation (CLICK HERE)
PAST PAPER QUESTIONS
Try and answer the past paper exam questions -You can write your answers on paper or print out the exam paper - Mark Scheme is provided at the end of the paper questions. (try not to look at the answers before attempting all questions)