Database Views
 

 The relational database provides programmers with a high degree of abstraction from the physical world of the computer. A view is a way of building even greater abstraction. Suppose that Jane in marketing says that she wants to see a table containing the following information:

  • user_id
  • email address
  • number of static pages viewed
  • number of bboard postings made
  • number of comments made

This information is spread out among four tables.

select u.user_id,
       u.email,
       count(ucm.page_id) as n_pages,
       count(bb.msg_id) as n_msgs,
       count(c.comment_id) as n_comments
from users u, user_content_map ucm, bboard bb, comments c
where u.user_id = ucm.user_id(+)
and u.user_id = bb.user_id(+)
and u.user_id = c.user_id(+)
group by u.user_id, u.email
order by upper(email)

Then Jane adds "I want to see this every day, updated with the latest information. I want to have a programmer write me some desktop software that connects directly to the database and looks at this information; I don't want my desktop software breaking if you reorganize the data model." Note: The outer join adds NULLs to every column in the report where there was no corresponding row in the user_content_map table (ie for those registered users who have not placed any content in the bulletin board).
create or replace view janes_marketing_view
as
select u.user_id,
       u.email,
       count(ucm.page_id) as n_pages,
       count(bb.msg_id) as n_msgs,
       count(c.comment_id) as n_comments
from users u, user_content_map ucm, bboard bb, comments c
where u.user_id = ucm.user_id(+)
and u.user_id = bb.user_id(+)
and u.user_id = c.user_id(+)
group by u.user_id, u.email
order by upper(u.email)

To Jane, this will look and act just like a table when she queries it:

select * from janes_marketing_view;

Why should she need to be aware that information is coming from four tables? Or that you've reorganized the RDBMS so that the information subsequently comes from six tables?

Protecting Privacy with Views

A common use of views is protecting confidential data. For example, suppose that all the people who work in a hospital collaborate by using a relational database. Here is the data model:

create table patients (
       patient_id     integer primary key,
       patient_name   varchar(100),
       hiv_positive_p char(1),
       insurance_p    char(1),
       ...
);

If a bunch of hippie idealists are running the hospital, they'll think that the medical doctors shouldn't be aware of a patient's insurance status. So when a doc is looking up a patient's medical record, the looking is done through

create view patients_clinical
as
select patient_id, patient_name, hiv_positive_p from patients;

The folks over in accounting shouldn't get access to the patients' medical records just because they're trying to squeeze money out of them:

create view patients_accounting
as
select patient_id, patient_name, insurance_p from patients;

Relational databases have elaborate permission systems similar to those on time-shared computer systems. Each person in a hospital has a unique database user ID. Permission will be granted to view or modify certain tables on a per-user or per-group-of-users basis. Generally the RDBMS permissions facilities aren't very useful for Web applications. It is the Web server that is talking to the database, not a user's desktop computer. So the Web server is responsible for figuring out who is requesting a page and how much to show in response.

Protecting Your Own Source Code

The ArsDigita Shoppe system, described in http://www.arsdigita.com/books/panda/ecommerce, represents all orders in one table, whether they were denied by the credit card processor, returned by the user, or voided by the merchant. This is fine for transaction processing but you don't want your accounting or tax reports corrupted by the inclusion of failed orders. You can make a decision in one place as to what constitutes a reportable order and then have all of your report programs query the view:

create or replace view sh_orders_reportable
as
select * from sh_orders
where order_state not in ('confirmed','failed_authorization','void');

Note that in the privacy example (above) we were using the view to leave unwanted columns behind whereas here we are using the view to leave behind unwanted rows.
If we add some order states or otherwise change the data model, the reporting programs need not be touched; we only have to keep this view definition up to date. Note that you can define every view with "create or replace view" rather than "create view"; this saves a bit of typing when you have to edit the definition later.

If you've used select * to define a view and subsequently alter any of the underlying tables, you have to redefine the view. Otherwise, your view won't contain any of the new columns. You might consider this a bug but Oracle has documented it, thus turning the behavior into a feature.

How Views Work

Programmers aren't supposed to have to think about how views work. However, it is worth noting that the RDBMS merely stores the view definition and not any of the data in a view. Querying against a view versus the underlying tables does not change the way that data are retrieved or cached. Standard RDBMS views exist to make programming more convenient or to address security concerns, not to make data access more efficient.

Materialized Views

In other words, the view might be created with a complicated JOIN, or an expensive GROUP BY with sums and averages. With a regular view, this expensive operation would be done every time you issued a query. With a materialized view, the expensive operation is done when the view is created and thus an individual query need not involve substantial computation.
Materialized views consume space because Oracle is keeping a copy of the data or at least a copy of information derivable from the data. More importantly, a materialized view does not contain up-to-the-minute information. When you query a regular view, your results includes changes made up to the last committed transaction before your SELECT. When you query a materialized view, you're getting results as of the time that the view was created or refreshed. Note that Oracle lets you specify a refresh interval at which the materialized view will automatically be refreshed. Such views are also called summaries.
At this point, you'd expect an experienced Oracle user to say "Hey, these aren't new. This is the old CREATE SNAPSHOT facility that we used to keep semi-up-to-date copies of tables on machines across the network!" What is new with materialized views is that you can create them with the ENABLE QUERY REWRITE option. This authorizes the SQL parser to look at a query involving aggregates or JOINs and go to the materialized view instead.
For each month, we have a count of how many users registered at photo.net. To execute the query, Oracle must sequentially scan the users table. If the users table grew large and you wanted the query to be instant, you'd sacrifice some timeliness in the stats with

create materialized view users_by_month
   enable query rewrite
   refresh complete
   start with 1999-03-28
   next sysdate + 1
   as
   select
     to_char(registration_date,'YYYYMM') as sort_key,
     rtrim(to_char(registration_date,'Month')) as pretty_month,
     to_char(registration_date,'YYYY') as pretty_year,
     count(*) as n_new
   from users
   group by
     to_char(registration_date,'YYYYMM'),
     to_char(registration_date,'Month'),
     to_char(registration_date,'YYYY')
   order by 1;

Oracle will build this view just after midnight on March 28, 1999. The view will be refreshed every 24 hours after that. Because of the enable query rewrite clause, Oracle will feel free to grab data from the view even when a user's query does not mention the view. For example, given the query

select count(*)
from users
where rtrim(to_char(registration_date,'Month')) = 'January'
and to_char(registration_date,'YYYY') = '1999'

Oracle would ignore the users table altogether and pull information from users_by_month. This would give the same result with much less work. Suppose that the current month is March 1999, though. The query

select count(*)
from users
where rtrim(to_char(registration_date,'Month')) = 'March'
and to_char(registration_date,'YYYY') = '1999'

will also hit the materialized view rather than the users table and hence will miss anyone who has registered since midnight (i.e., the query rewriting will cause a different result to be returned).