Basics of Data Modelling
 

Basics of Data Modelling


In this section and the following, i have collected many possible differing examples of SQL queries starting with the basic ones and graduating to the most complex ones. A relational database stores data in tables (relations). A database is a collection of tables. A table consists of a list of records - each record in a table has the same structure, each has a fixed number of "fields" of a given type.So it is using tables in an RDBMS in terms of which you model your data.

Tables

create table <table> (
<column 1> <data type> [not null] [unique] [<column constraint>],
: : : : : : : : :
<column n> <data type> [not null] [unique] [<column constraint>],
[<table constraint(s)>]
);

The keyword unique speci es that no two tuples can have the same attribute value for this column. Unless the condition not null is also speci ed for this column, the attribute value null is allowed and two tuples having the attribute value null for this column do not violate the constraint.
Example: The create table statement for our EMP table has the form

create table EMP (
EMPNO number(4) not null,
ENAME varchar2(30) not null,
JOB varchar2(10),
MGR number(4),
HIREDATE date,
SAL number(7,2),
DEPTNO number(2)
);

Remark: Except for the columns EMPNO and ENAME null values are allowed.

CREATE TABLE your_table_name (
       the_key_column      key_data_type PRIMARY KEY,
       a_regular_column      a_data_type,
       an_important_column   a_data_type NOT NULL,
       ... up to 996 intervening columns in Oracle8 ...
       the_last_column       a_data_type
);
Even in a simple example such as the one above, there are few items worth noting. First, I like to define the key column(s) at the very top. Second, the primary key constraint has some powerful effects. It forces the_key_column to be non-null. It causes the creation of an index on the_key_column, which will slow down updates to your_table_name but improve the speed of access when someone queries for a row with a particular value of the_key_column. Oracle checks this index when inserting any new row and aborts the transaction if there is already a row with the same value for the_key_column. Third, note that there is no comma following the definition of the last row.

If you didn't get it right the first time, you'll probably want to

alter table your_table_name add (new_column_name a_data_type any_constraints);

or

alter table your_table_name modify (existing_column_name new_data_type new_constraints);

In Oracle 8i you can drop a column:

alter table your_table_name drop column existing_column_name;

If you're still in the prototype stage, you'll probably find it easier to simply

drop table your_table_name;

and recreate it. At any time, you can see what you've got defined in the database by querying Oracle's Data Dictionary:

SQL> select table_name from user_tables order by table_name;
 

after which you will typically type describe table_name_of_interest in SQL*Plus:

SQL> describe users;

Note that Oracle displays its internal data types rather than the ones you've given, e.g., number(38) rather than integer and varchar2 instead of the specified varchar.

Constraints

The specification of a (simple) constraint has the following form:

[constraint <name>] primary key|unique|not null

A constraint can be named. It is advisable to name a constraint in order to get more meaningful information when this constraint is violated due to, e.g., an insertion of a tuple that violates the constraint. If no name is speci ed for the constraint, Oracle automatically generates a name of the pattern SYS C<number>.
The two most simple types of constraints have already been discussed: not null and unique. Probably the most important type of integrity constraints in a database are primary key constraints. A primary key constraint enables a unique identi cation of each tuple in a table.Based on a primary key, the database system ensures that no duplicates appear in a table. Note that in contrast to a unique constraint, null values are not allowed.For example, for our EMP table, the specification

create table EMP (
EMPNO number(4) constraint pk emp primary key,
... );

Example:
We want to create a table called PROJECT to store information about projects. For each project, we want to store the number and the name of the project, the employee number of the project's manager, the budget and the number of persons working on the project, and the start date and end date of the project. Furthermore, we have the following conditions:
- a project is identi ed by its project number,
- the name of a project must be unique,
- the manager and the budget must be de ned.

Table definition:

create table PROJECT (
PNO number(3) constraint prj pk primary key,
PNAME varchar2(60) unique,
PMGR number(4) not null,
PERSONS number(5),
BUDGET number(8,2) not null,
PSTART date,
PEND date);

A unique constraint can include more than one attribute. In this case the pattern unique(<columni>, : : : , <column j>) is used. If it is required, for example, that no two projects have the same start and end date, we have to add the table constraint constraint no same dates unique(PEND, PSTART) This constraint has to be de ned in the create table command after both columns PEND and PSTART have been de ned. A primary key constraint that includes more than only one column can be speci ed in an analogous way.
Instead of a not null constraint it is sometimes useful to specify a default value for an attribute if no value is given, e.g., when a tuple is inserted. For this, we use the default clause.
Example:
If no start date is given when inserting a tuple into the table PROJECT, the project start date should be set to January 1st, 1995:
PSTART date default('01-JAN-95')
Note: Unlike integrity constraints, it is not possible to specify a name for a default.

When you're defining a table, you can constrain single rows by adding some magic words after the data type:

  • not null; requires a value for this column
  • unique; two rows can't have the same value in this column (side effect in Oracle: creates an index)
  • primary key; same as unique except that no row can have a null value for this column and other tables can refer to this column
  • check; limit the range of values for column, e.g., rating integer check(rating > 0 and rating <= 10)
  • references; this column can only contain values present in another table's primary key column, e.g., user_id not null references users in the bboard table forces the user_id column to only point to valid users. An interesting twist is that you don't have to give a data type for user_id; Oracle assigns this column to whatever data type the foreign key has (in this case integer).

Constraints can apply to multiple columns:

create table static_page_authors (
 page_id  integer not null references static_pages,
 user_id  integer not null references users,
 notify_p char(1) default 't' check (notify_p in ('t','f')),
 unique(page_id,user_id)
);

Oracle will let us keep rows that have the same page_id and rows that have the same user_id but not rows that have the same value in both columns (which would not make sense; a person can't be the author of a document more than once). Suppose that you run a university distinguished lecture series. You want speakers who are professors at other universities or at least PhDs. On the other hand, if someone controls enough money, be it his own or his company's, he's in. Oracle stands ready:

create table distinguished_lecturers (
 lecturer_id  integer primary key,
 name_and_title  varchar(100),
 personal_wealth  number,
 corporate_wealth number,
 check (instr(upper(name_and_title),'PHD') <> 0
               or instr(upper(name_and_title),'PROFESSOR') <> 0
               or (personal_wealth + corporate_wealth) > 1000000000)
);

The most simple way to insert a tuple into a table is to use the insert statement

insert into <table> [(<column i, : : : , column j>)]
values (<value i, : : : , value j>);

For each of the listed columns, a corresponding (matching) value must be speci ed. Thus an insertion does not necessarily have to follow the order of the attributes as speci ed in the create table statement. If a column is omitted, the value null is inserted instead. If no column list is given, however, for each column as de ned in the create table statement a value must be given.
Examples:
insert into PROJECT(PNO, PNAME, PERSONS, BUDGET, PSTART)
values(313, 'DBS', 4, 150000.42, '10-OCT-94');
or
insert into PROJECT
values(313, 'DBS', 7411, null, 150000.42, '10-OCT-94', null);

Now continuing with our example above,
insert into distinguished_lecturers
values
(1,'Professor Ellen Egghead',-10000,200000);

1 row created.

insert into distinguished_lecturers
values
(2,'Bill Gates, innovator',75000000000,18000000000);

1 row created.

insert into distinguished_lecturers
values
(3,'Joe Average',20000,0);

ORA-02290: check constraint (PHOTONET.SYS_C001819) violated

As desired, Oracle prevented us from inserting some random average loser into the distinguished_lecturers table, but the error message was confusing in that it refers to a constraint given the name of "SYS_C001819" and owned by the PHOTONET user. We can give our constraint a name at definition time:

create table distinguished_lecturers (
 lecturer_id  integer primary key,
 name_and_title  varchar(100),
 personal_wealth  number,
 corporate_wealth number,
 constraint ensure_truly_distinguished
        check (instr(upper(name_and_title),'PHD') <> 0
               or instr(upper(name_and_title),'PROFESSOR') <> 0
               or (personal_wealth + corporate_wealth) > 1000000000)
);

Note: instr() is a built-in function which checks whether the pattern(arg2) exists in the string (arg1).
insert into distinguished_lecturers
values
(3,'Joe Average',20000,0);

ORA-02290: check constraint (PHOTONET.ENSURE_TRULY_DISTINGUISHED) violated

Now the error message is easier to understand by application programmers.

Creating More Elaborate Constraints with Triggers

The default Oracle mechanisms for constraining data are not always adequate. For example, the ArsDigita Community System auction module has a table called au_categories. The category_keyword column is a unique shorthand way of referring to a category in a URL. However, this column may be NULL because it is not the primary key to the table. The shorthand method of referring to the category is optional.

create table au_categories (
        category_id             integer primary key,
        -- shorthand for referring to this category,
        -- e.g. "bridges", for use in URLs
        category_keyword        varchar(30),
        -- human-readable name of this category,
        -- e.g. "All types of bridges"
        category_name           varchar(128) not null
);

We can't add a UNIQUE constraint to the category_keyword column. That would allow the table to only have one row where category_keyword was NULL. So we add a trigger that can execute an arbitrary PL/SQL expression and raise an error to prevent an INSERT if necessary:

create or replace trigger au_category_unique_tr
before insert
on au_categories
for each row
declare
        existing_count integer;
begin
        select count(*) into existing_count from au_categories
            where category_keyword = :new.category_keyword;
        if existing_count > 0
        then
            raise_application_error(-20010, 'Category keywords must be unique if used');
        end if;
end;

This trigger queries the table to find out if there are any matching keywords already inserted. If there are, it calls the built-in Oracle procedure raise_application_error to abort the transaction.

DML (Insert/Update/Delete)

If there are already some data in other tables, these data can be used for insertions into a new table. For this, we write a query whose result is a set of tuples to be inserted. Such an insert statement has the form

insert into <table> [(<column i, : : : , column j>)] <query>

create table OLDEMP (
ENO number(4) not null,
HDATE date);

We now can use the table EMP to insert tuples into this new relation:

insert into OLDEMP (ENO, HDATE)
select EMPNO, HIREDATE from EMP
where HIREDATE < '31-DEC-60';

Update
For modifying attribute values of (some) tuples in a table, we use the update statement:

update <table> set
<column i> = <expression i>, : : : , <column j> = <expression j>
[where <condition>];

An expression consists of either a constant (new value), an arithmetic or string operation, or an SQL query. Note that the new value to assign to <column i> must a the matching data type.
An update statement without a where clause results in changing respective attributes of all tuples in the speci ed table. Typically, however, only a (small) portion of the table requires an update.
Examples:

  • The employee JONES is transfered to the department 20 as a manager and his salary is increased by 1000:

update EMP set
JOB = 'MANAGER', DEPTNO = 20, SAL = SAL +1000
where ENAME = 'JONES';

  • All employees working in the departments 10 and 30 get a 15% salary increase.

update EMP set
SAL = SAL * 1.15 where DEPTNO in (10,30);

Analogous to the insert statement, other tables can be used to retrieve data that are used as new values. In such a case we have a <query> instead of an <expression>.
Example: All salesmen working in the department 20 get the same salary as the manager who has the lowest salary among all managers.

update EMP set
SAL = (select min(SAL) from EMP
where JOB = 'MANAGER')
where JOB = 'SALESMAN' and DEPTNO = 20;

Explanation: The query retrieves the minimum salary of all managers. This value then is assigned to all salesmen working in department 20.

It is also possible to specify a query that retrieves more than only one value (but still only one tuple!). In this case the set clause has the form set(<column i, ... , column j>) = <query>.It is important that the order of data types and values of the selected row exactly correspond to the list of columns in the set clause.

Delete
All or selected tuples can be deleted from a table using the delete command:

delete from <table> [where <condition>];

If the where clause is omitted, all tuples are deleted from the table. An alternative command for deleting all tuples from a table is the truncate table <table> command. However, in this case, the deletions cannot be undone.

Example: Delete all projects (tuples) that have been nished before the actual date (system date):

delete from PROJECT where PEND < sysdate;

sysdate is a function in SQL that returns the system date. Another important SQL function is user, which returns the name of the user logged into the current Oracle session.

Commit and Rollback
A sequence of database modi cations, i.e., a sequence of insert, update, and delete statements, is called a transaction. Modi cations of tuples are temporarily stored in the database system. They become permanent only after the statement commit; has been issued.As long as the user has not issued the commit statement, it is possible to undo all modi cationssince the last commit. To undo modi cations, one has to issue the statement rollback;.It is advisable to complete each modi cation of the database with a commit (as long as the modi cation has the expected e ect). Note that any data de nition command such as create
table results in an internal commit. A commit is also implicitly executed when the user terminates an Oracle session.

Examples on data modelling

Composite primary key: the primary key is made up of more than one field.
Foreign key: one (or more) field from this table relates to the primary key of another table. Eg:

create table t_holiday
 (yr      integer
 ,country varchar(50)
 ,commnt  varchar(80)
 ,foreign key(country) references cia(name)
 ,primary key (yr,country)
 );

A foreign key should refer to a candidate key in some table. This is usually the primary key but may be a field (or list of fields) specified as UNIQUE.Eg:

create table t_a (
  i integer primary key);

create table t_b (
  j integer ,
foreign key (j) references t_a(i) );

You may not drop a table if it is referenced by another table.
drop table t_a; //Error!

The CASCADE CONSTRAINTS clause can be used to remove the references.
drop table t_a cascade constraints;

CREATE OR REPLACE will remove the old table if it exists.Eg:
create or replace table
     t_holiday (a integer);

Users must be granted RESOURCE to create tables.
grant resource to scott;

CAUTION! You may not use a reserved word as the name of a field. Many popular words are used by the system; some words to avoid... date, day, index, number, order, size, year. Eg: create table t_wrong (date date);

Case study
Case Study I: A threaded discussion forum:
create table bboard (
        msg_id   char(6) not null primary key,
        refers_to   char(6),
        email   varchar(200),
        name    varchar(200),
        one_line   varchar(700),
        message   clob,
        notify   char(1) default 'f' check (notify in ('t','f')),
        posting_time  date,
        sort_key   varchar(600)
);
Messages are uniquely keyed with msg_id, refer to each other (i.e., say "I'm a response to msg X") with refers_to, and a thread can be displayed conveniently by using the sort_key column.

alter table bboard add (originating_ip varchar(16));

It became apparent that we needed ways to:

·  display site history for users who had changed their email addresses

·  discourage problem users from burdening the moderators and the community

·  carefully tie together user-contributed content in the various subsystems

The solution was obvious to any experienced database nerd: a canonical users table and then content tables that reference it. Here's a simplified version of the data model, taken from the ArsDigita Community System:
create table users (
 user_id   integer not null primary key,
 first_names  varchar(100) not null,
 last_name  varchar(100) not null,
 email   varchar(100) not null unique,
        ..
);

create table bboard (
 msg_id  char(6) not null primary key,
 refers_to char(6),
 topic  varchar(100) not null references bboard_topics,
 category varchar(200), -- only used for categorized Q&A forums
 originating_ip varchar(16), -- stored as string, separated by periods
 user_id  integer not null references users,
 one_line varchar(700),
 message  clob,
 -- html_p - is the message in html or not
 html_p  char(1) default 'f' check (html_p in ('t','f')),
        ...
);

create table classified_ads (
 classified_ad_id integer not null primary key,
 user_id   integer not null references users,
 ...
);

Note that a contributor's name and email address no longer appear in the bboard table. That doesn't mean we don't know who posted a message. In fact, this data model can't even represent an anonymous posting: user_id integer not null references users requires that each posting be associated with a user ID and that there actually be a row in the users table with that ID.

Case Study II: Representing a Web Site core content. Its in continuation with the above Case Study.
Requirements:

·  We will need a table that holds the static pages themselves.

·  Since there are potentially many comments per page, we need a separate table to hold the user-submitted comments.

·  Since there are potentially many related links per page, we need a separate table to hold the user-submitted links.

·  Since there are potentially many authors for one page, we need a separate table to register the author-page many-to-one relation.

·  Considering the "help point readers to stuff that will interest them" objective, it seems that we need to store the category or categories under which a page falls. Since there are potentially many categories for one page, we need a separate table to hold the mapping between pages and categories.

create table static_pages (
 page_id  integer not null primary key,
 url_stub varchar(400) not null unique,
 original_author integer references users(user_id),
 page_title varchar(4000),
 page_body clob,
 obsolete_p char(1) default 'f' check (obsolete_p in ('t','f')),
 members_only_p char(1) default 'f' check (members_only_p in ('t','f')),
 price  number,
 copyright_info varchar(4000),
 accept_comments_p char(1) default 't' check (accept_comments_p in ('t','f')),
 accept_links_p  char(1) default 't' check (accept_links_p in ('t','f')),
 last_updated  date,
 -- used to prevent minor changes from looking like new content
 publish_date  date
);

create table static_page_authors (
 page_id  integer not null references static_pages,
 user_id  integer not null references users,
 notify_p char(1) default 't' check (notify_p in ('t','f')),
 unique(page_id,user_id)
);

Note that we use a generated integer page_id key for this table. Much better is to use Oracle's built-in sequence generation facility:

create sequence page_id_sequence start with 1;

Then we can get new page IDs by using page_id_sequence.nextval in INSERT statements.