Normalization2
 Page 1Page 2Page 3Page 4Page 5Page 6Page 7Page 8

Oracle Questions And Answer

Page 1

SQL*PLUS QUESTIONS

ORACLE  FORMS TEST PAPER 
ORACLE QUESTIONS And ANSWERs Page 1 
Oracle Questions And Answer Page 2

Oracle Questions And Answer Page 3

Reports  Questions And Answers

Oracle Assignments

Oracle Faqs

Normalization

We discuss four normal forms: first, second, third, and B oyce-Codd normal forms, that are usually referred to as 1NF, 2NF, 3NF, and BCNF. Normalization is a process that “improves” a database design by generating relations that are of higher normal forms.

The objective of normalization is sometimes stated: “to create relations where every dependency is on the key, the whole key, and nothing but the key”.

A relation that is fully normalized is about a single concept such as a student entity set, a course entity set, and so on. We will give examples later on of relations containing data involving more than one concept; as we will see we can use normalization principles to improve upon these relations.

To understand normal forms we require a solid understanding of functional dependencies. Note there are more normal forms relating to functional dependencies and multi-valued dependencies, but the four we study are particularly important. The interested reader will consult other references for further study.

There is a sequence to normal forms: 1NF is considered the weakest, 2NF is stronger than 1NF, 3NF is stronger than 2NF, and BCNF is considered the strongest of these four normal forms. Also, any relation that is in BCNF, is in 3NF; any relation in 3NF is in 2NF; and any relation in 2NF is in 1NF. Sometimes this correspondence is shown as:

We consider a relation in BCNF to be a fully normalized relation. The reason that a relation in a higher normal form is better than one in a lower normal form is because update semantics for the affected data are simplified. This means that applications required to maintain the database are simpler. In general, we consider fully normalized relations easier to maintain, but in general these fully normalized relations require more work to be expended when retrieving data. This means that retrieving information becomes more costly, more time-consuming.

In our discussion on normalization, we first introduce the concept of functional dependency, then we discuss update anomalies (situations that motivate normalization). Finally, we discuss each of 1NF, 2NF, 3NF, and BCNF.

Functional Dependencies

We say an attribute, B, has a functional dependency on another attribute, A, if for a given value of A, there is at most one value for B. We illustrate this as:

A à B

For example, suppose we are keeping track of student phone numbers, and that we only keep one phone number for each student. Suppose each student is identified by their unique student number. Since for a given student number value there is at most one phone number, we say there is a functional dependency of phone number on student number:

StuNum à StuPhone

The meaning of the above can also be restated as:

StuNum functionally determines StuPhone, or,

StuNum is a determinant of StuPhone.

You may be wondering what is done when a student changes their phone number. Our considerations at this point do not allow for keeping a historical record of student phone numbers - if the student changes their phone number, we will update the phone number attribute and have lost the prior value.

Consider the relation and sample data shown below. There is only one row for each student number, and so for each student number there is only one phone number. The data supports the functional dependency StuNum à StuPhone. The fact that two students have the same phone number is not a problem; for the functional dependency, StuNum à StuPhone, we require that there is only one phone number per student number; it doesn’t restrict the number of students that may share the same phone.

StudentPhone relation {supports the FD StuNum à StuPhone}

StuNum

StuPhone

123

234-5678

456

234-1122

555

234-5678

633

234-1010

787

266-1234

The above StudentPhone relation has two attributes and so there are two possible FDs: 

StuNum à StuPhone

StuPhone  à StuNum

You can see that StuPhone  à StuNum is not true. There are two students with the same phone number (234-5678), and so there cannot be a functional dependency of student number on phone number. The functional dependency, StuPhone  à StuNum, is not valid; it does not exist.

Exercise

Consider the next relation, Student, where StuNum is the primary key, and we have three other attributes for phone number, first name and last name.

Student relation with key StuNum and three non-key attributes

StuNum

StuPhone

StuFname

StuLname

123

234-5678

John

Doe

456

234-1122

Peter

Smith

555

234-5678

Alan

Lee

633

234-1010

Peter

Doe

787

266-1234

Alan

Lee

Exercise

Since StuNum is the primary key, there is only one row per student number and so all attributes are functionally dependent on the primary key. We have the three functional dependencies:

When we draw functional dependency diagrams, these simple diagrams (like the one above) showing all functional dependencies originating from the primary key, are the types we like to see. They are simple and straightforward. One thing they illustrate, is that the relation is about the key, the whole key, and nothing but the key. Each non-key attribute describes some aspect related to the entity identified by the key.

Exercise

Another common way to draw the above three functional dependencies on StuNum is:

Transitive dependency

Note that we could have a relation with three attributes A, B, and C, and where A à B and B à C. Functional dependencies are transitive, which means that we also have the functional dependency A à C; we say that C is transitively dependent on A through B. As an example, suppose we have a relation with attributes EmpNum, EmpName, DeptNum, DeptName, and with the functional dependencies shown in Figure N-7.

EmpNum

EmpName

DeptNum

DeptName

The diagram tells us there is

  • one employee name and one department number for a given employee number,
  • there is one department name associated with a department number.

Because FDs are transitive, we can deduce the dependency:

EmpNum à DeptName

This tells us that there is just one department name for a given employee number. You may have thought that this is obvious! This notion of transitivity is important to the study of 3NF in a later section. Note that the above relation holds data that is relevant to two concepts: employees and departments.

Partial dependency

Consider a relation with department number, department name, course number and course title attributes. The FDs and sample data is shown below. Course number and department number together functionally determine the course title; department number alone functionally determines department name.

DeptNum

CourseNum

CourseTitle

DeptName

DeptNum

CourseNum

CourseTitle

DeptName

91

2914

Relational Databases

Computing

91

1903

Programming I

Computing

44

2201

Intermediate Geography

Geography

35

2201

Intermediate Calculus

Mathematics

In order to identify any row in the corresponding relation, it would be necessary to know both the department number and the course number; DeptNum and CourseNum form the PK. Since DeptName is functionally dependent on a subset of the PK, we say that DeptName is partially dependent on the PK. This notion of partial dependence is important for the study of 2NF. Note that the above relation holds information pertinent to two concepts: courses and departments. Note that CourseTitle is functionally dependent on the whole PK; we say that CourseTitle is fully dependent on the PK.

Anomalies

An anomaly is a variation that differs in some way from what is considered normal. With respect to maintaining a database, we consider what must occur when a database record is updated, inserted, or deleted. In databases (e.g. OLTP databases) where these update, insert, and/or delete operations are common, it is desirable for these operations to be as straightforward and as efficient as possible.

When relations are not fully normalized we say they exhibit update anomalies (because the basic operations are not as efficient as possible). There is some aspect of the relation that may be awkward to maintain. Usually, the design goal for an OLTP database is that it be easy to understand and to maintain. In particular, if the value of one attribute for an entity must be changed, then ideally, that change requires only one record to be updated. If only one record changes, then the cost or time of performing the update is predictable and minimal.

Consider the relation structure and sample records:

DeptNum

CourseNum

StuNum

Grade

StuGpa

91

101

74210

A

3.04

91

101

66277

B

3.75

51

222

66277

B

3.75

33

101

74210

B

3.04

This relation is used for keeping track of the students enrolled in courses, the grade assigned to the student for the course, and (oddly) the student’s overall grade point average. The functional dependency diagram for this case is:

CourseNum

StuNum

Grade

StuGpa

DeptNum

 What must happen if a student’s gpa changes? We always want our databases to have correct information, and so the gpa must change in several records, not just one record. We refer to this type of difficulty as an update anomaly – the simple change of a student’s gpa affects, not just one record, but potentially several records in the database. The update operation is more complex than necessary, and this means it is more expensive to do, resulting in slower performance.

In this case, which attributes constitute the primary key? The primary key is {DeptNum, CourseNum, StuNum}. Note the structure of the functional dependencies. One of them is based on the student number only (partial dependency); the other is based on the full primary key.

Now, we’ll consider delete and insert anomalies.For these examples, assume that a student’s gpa is only stored in this relation. Suppose we happen to delete all rows relating to student 66277. What happens to the student’s gpa information? We lost it! As you probably know, this design is poor – perhaps we should never mix concepts, storing student information with enrolment information! Because we assumed that the gpa is only stored in this relation, this is an example of a deletion anomaly. Next, we consider an insertion anomaly.

Suppose we add a new student (and assume a new student's GPA is 0). How do we add this information (i.e. insert a new record) with the database structure we have? We can’t! Before we could add a row to this relation, we need a course number too. As you can tell, we have made the management of data more difficult with this design. The design makes it difficult to manage student information. If a database were to exist with a table like this, the designers may have used a “special” course number (say course 0) to represent the situation we have just considered. That type of ‘rule’ is something we do not recommend.

The previous discussion concerning anomalies highlights some of the data management issues that arise when data in a relation is not fully normalized. Another way of describing the general problem here, as far as updating a database is concerned, is that redundant data makes it more complicated for us to keep the data consistent. In the example we have used, the GPA for a student is stored redundantly (repeatedly), the same value for the same student appears in several rows.

First Normal Form

We say a relation is in 1NF if all values stored in the relation are single-valued and atomic. With this rule, we are simplifying the structure of a relation; we are simplifying the kinds of values that are stored in the relation. In fact some definitions you may encounter for relation, state or imply that for something to be a relation it must be in first normal form; first normal form is built into the definition of relation.

Consider the following EmployeeDegrees relation. Since EmpDegrees is a multi-valued attribute (EmpDegrees holds all the degrees that an Employee has earned), the relation is not 1NF.

EmployeeDegrees

EmpNum

EmpPhone

EmpDegrees

123

233-9876

333

233-1231

BA, BSc, PhD

679

233-1231

BSc, MSc

How would you add ‘PhD’ for employee 679? You could do it, but it would take a little bit of programming skill, and your operation is not a simple SQL update. Consider other possibilities. How would you delete MSc for employee 679? How would you change the BSc for employee 333 to be a BA? In general, we say a non-1NF relation is difficult to maintain. Relational database systems have become successful and prolific, partly because they give us efficient mechanisms for handling 1NF data.

Exercise: Draw an ERD corresponding to the EmployeeDegrees relation.

Exercise: What needs to be done to achieve 1NF?

Exercise: Draw an ERD corresponding to the solution (the two tables)  given below.

Answer to second exercise: Replace EmployeeDegrees with two relations, one for employee data and the other for employee education data. Consider the following tables. Note the primary keys for these two relations, and how one relation has a foreign key referencing the other relation. 

Employee

EmpNum

EmpPhone

123

233-9876

333

233-1231

679

233-1231

EmployeeDegree

EmpNum

EmpDegree

333

PhD

333

BSc

333

BA

679

MSc

679

BSc

The PK of EmployeeDegree is {EnpNum, EmpDegree}. EmployeeDegree is 1NF. Note that EmpNum in EmployeeDegree is a FK referencing the PK in Employee.

Second Normal Form

2NF (and 3NF) both involve the concepts of key and non-key attributes. 2NF is where partial dependencies play an important role. A key attribute is any attribute that is part of a key; any attribute that is not a key attribute is a non-key attribute. Our first statement of 2NF is: A relation is in 2NF if it is in 1NF, and every non-key attribute is fully dependent on the primary key. We’ll revisit our definition at the end of this section.

Consider the following relation and FDs. There are 3 key attributes and 2 non-key attributes. One of these non-key attributes, StuGpa, is dependent on StuNum. In this case we have a partial dependency. StuGpa is partially dependent on the primary key.

EnrollmentAndGPAs

DeptNum

CourseNum

StuNum

Grade

StuGpa

91

101

74210

A

3.04

91

101

66277

B

3.75

51

222

66277

B

3.75

33

101

74210

B

3.04

CourseNum

StuNum

Grade

StuGpa

DeptNum

When we have a relation such as the above, we can easily split the relation into two (in general, two or more) relations that will both be in 2NF, and where (importantly) we have not lost any information.

CourseNum

StuNum

Grade

StuGpa

DeptNum

Decomposition: we can split a relation into two or more relations. It is important to not lose information – we want a lossless decomposition. We can join these new relations on StuNum and get back what we had before.

StuGpa

StuNum

CourseNum

StuNum

Grade

DeptNum

As a result we end up with the following data in the two relations. Note that because of the StuNum foreign key in the enrolment table, we are able to recreate the data we had before the decomposition. The only non-key attribute in Enrollment is Grade and it is fully dependent on the PK

Enrollment

DeptNum

CourseNum

StuNum

Grade

91

101

74210

A

91

101

66277

B

51

222

66277

B

33

101

74210

B

Student

StuNum

StuGpa

74210

3.04

66277

3.75

When we recognize a relation is not in 2NF, it is because of one or more partial dependencies. When we decompose to form 2NF relations, we remove partial dependencies. We have ensured that the non-key attributes describe the whole key.

The choice of primary key is arbitrary: it is just one of the candidate keys. To make our definition of 2NF more precise we relate full dependence to the candidate keys: A relation is in 2NF if it is in 1NF, and every non-key attribute is fully dependent on each candidate key.

Third Normal Form

Third normal form involves the concepts of candidate key, non-key attribute and transitive dependency. We say a relation is in 3NF if the relation is in 1NF and all determinants of non-key attributes are candidate keys.

For example, suppose we have an employee relation with EmpNum, EmpName, DeptNum, DeptName, and with the functional dependencies shown below. We are assuming each employee has one name, works in one department, and each department has one name.

Sample table not in 3NF.

EmpNum

EmpName

DeptNum

DeptName

We shall assume that the relation is in 1NF. The only candidate key is EmpNum and so it is the primary key too. The relation satisfies the requirements for 2NF. Is the relation in 3NF? No, it is not in 3NF because of the transitive dependency of DeptName on EmpNum via DeptNum; DeptName is dependent on DeptNum and DeptNum is not a candidate key. Consider sample data in the following table:

EmpNum

EmpName

DeptNum

DeptName

111

Smith

6

Parts

222

Jones

6

Parts

333

Stevens

7

Billing

Note the redundancy: the same department name appears in more than one row. To achieve 3NF, we decompose again, replacing the given relation by one or more other relations in such a way that each new relation is in 3NF and there is no loss of information. Consider the following decomposition diagram:

EmpNum

EmpName

DeptNum

DeptName

Decomposition: we can split a table into two or more tables. It is important to not lose information – we want a lossless decomposition.

EmpNum

EmpName

DeptNum

DeptNum

DeptName

Decomposing a table to generate 3NF tables

The content of the two relations, for our sample data, is shown below.

The only determinant is EmpNum, Note that DeptNum in Employee is a foreign key referencing the primary key DeptNum in Department.

Employee

EmpNum

EmpName

DeptNum

111

Smith

6

222

Jones

6

333

Stevens

7

The only determinant is DeptNum.

Department

DeptNum

DeptName

6

Parts

7

Billing

When we decompose a relation that is not in 3NF into 3NF relations, we are removing any unwanted transitive dependencies. We are ensuring that our relations have non-key attributes that describe an entity represented by the primary key, and nothing but the primary key.