Database Transactions
 

The simplest and most direct interface to a relational database involves a procedural program in C, Java, Lisp, Perl, or Tcl putting together a string of SQL that is then sent to to the RDBMS. Here's how the ArsDigita Community System constructs a new entry in the clickthrough log:

insert into clickthrough_log
 (local_url, foreign_url, entry_date, click_count)
values
 ('$local_url', '$foreign_url', trunc(sysdate), 1)"

The INSERT statement adds one row, filling in the four list columns. Two of the values come from local variables set within the Web server, $local_url and $foreign_url. Because these are strings, they must be surrounded by single quotes. One of the values is dynamic and comes straight from Oracle: trunc(sysdate). Recall that the date data type in Oracle is precise to the second. We only want one of these rows per day of the year and hence truncate the date to midnight. Finally, as this is the first clickthrough of the day, we insert a constant value of 1 for click_count.

Atomicity

Each SQL statement executes as an atomic transaction. For example, suppose that you were to attempt to purge some old data with

delete from clickthrough_log where entry_date + 120 < sysdate;

(delete clickthrough records more than 120 days old) and that 3500 rows in clickthrough_log are older than 120 days. If your computer failed halfway through the execution of this DELETE, i.e., before the transaction committed, you would find that none of the rows had been deleted. Either all 3500 rows will disappear or none will. More interestingly, you can wrap a transaction around multiple SQL statements.

Note: However, if what you're actually doing is moving data from one place within the RDBMS to another, it is extremely bad taste to drag it all the way out to an application program and then stuff it back in. Much better to use the "INSERT ... SELECT" form. It is legal in SQL to put function calls or constants in your select list. You can compute multiple values in a single query:

select posting_time, 2+2
from bboard
where msg_id = '000KWj';

POSTING_TI   2+2
---------- ----------
1998-12-13     4

Consider a comment editing transaction and look at the basic structure:

  • open a transaction
  • insert into an audit table whatever comes back from a SELECT statement on the comment table
  • update the comment table
  • close the transaction

Suppose that something goes wrong during the INSERT. The tablespace in which the audit table resides is full and it isn't possible to add a row. Putting the INSERT and UPDATE in the same RDBMS transactions ensures that if there is a problem with one, the other won't be applied to the database.

Consistency

Suppose that we've looked at a message on the bulletin board and decide that its content is so offensive we wish to delete the user from our system:

select user_id from bboard where msg_id = '000KWj';

   USER_ID
----------
     39685

delete from users where user_id = 39685;
*
ERROR at line 1:
ORA-02292: integrity constraint (PHOTONET.SYS_C001526) violated - child record
found

Oracle has stopped us from deleting user 39685 because to do so would leave the database in an inconsistent state. Here's the definition of the bboard table:

create table bboard (
 msg_id  char(6) not null primary key,
 refers_to char(6),
        ...
 user_id  integer not null references users,
 one_line varchar(700),
 message  clob,
 ...
);

The user_id column is constrained to be not null. Furthermore, the value in this column must correspond to some row in the users table (references users). By asking Oracle to delete the author of msg_id 000KWj from the users table before we deleted all of his or her postings from the bboard table, we were asking Oracle to leave the RDBMS in an inconsistent state.

Mutual Exclusion

 When you have multiple simultaneously executing copies of the same program, you have to think about mutual exclusion. If a program has to

  • read a value from the database
  • perform a computation based on that value
  • update the value in the database based on the computation

Then you want to make sure only one copy of the program is executing at a time through this segment. First, anything having to do with locks only makes sense when the three operations are grouped together in a transaction. Second, to avoid deadlocks a transaction must acquire all the resources (including locks) that it needs at the start of the transaction. A SELECT in Oracle does not acquire any locks but a SELECT .. FOR UPDATE does. Here's the beginning of the transaction that inserts a message into the bboard table (from /bboard/insert-msg.tcl):

select last_msg_id
from msg_id_generator
for update of last_msg_id

Much more efficient is simply to start the transaction with

lock table an_alert_log in exclusive mode;

This is a big hammer and you don't want to hold a table lock for more than an instant.

What if I just want some unique numbers?

Does it really have to be this hard? What if you just want some unique integers, each of which will be used as a primary key? Consider a table to hold news items for a Web site:

create table news (
 title  varchar(100) not null,
 body  varchar(4000) not null,
 release_date date not null,
        ...
);
The traditional database design that gets around all of the problems is the use of a generated key. Here's how the news module of the ArsDigita Community System works,

create sequence news_id_sequence start with 1;

create table news (
 news_id  integer primary key,
 title  varchar(100) not null,
 body  varchar(4000) not null,
 release_date date not null,
 ...
);

We're taking advantage of the nonstandard but very useful Oracle sequence facility. In almost any Oracle SQL statement, you can ask for a sequence's current value or next value.
SQL> create sequence foo_sequence;
SQL> select foo_sequence.nextval from dual;
SQL> select foo_sequence.currval from dual;

You can use the sequence generator directly in an insert, e.g.,

insert into news (news_id, title, body, release_date)
values
(news_id_sequence.nextval,
 'Tuition Refund at MIT',
 'Administrators were shocked and horrified ...',
 '1998-03-12);

Caveats for Sequence: Oracle sequences are optimized for speed. Hence they offer the minimum guarantees that Oracle thinks are required for primary key generation and no more. If you ask for a few nextvals and roll back your transaction, the sequence will not be rolled back. You can't rely on sequence values to be, uh, sequential. They will be unique. They will be monotonically increasing. But there might be gaps. The gaps arise because Oracle pulls, by default, 20 sequence values into memory and records those values as used on disk. This makes nextval very fast since the new value need only be marked use in RAM and not on disk. But suppose that someone pulls the plug on your database server after only two sequence values have been handed out. If your database administrator and system administrator are working well together, the computer will come back to life running Oracle. But there will be a gap of 18 values in the sequence (e.g., from 2023 to 2041). That's because Oracle recorded 20 values used on disk and only handed out 2. So till the time your application (using Oracle of course!) requires only uniqueness of ids, use of Oracle generated sequences will do....or else you will have to write your own sequence generators based on some logic deemed fit by you for your kind of application.