Data Warehousing

Understanding Data Warehousing and Dimensional Modeling

An introduction to the subject of data warehousing -- objective, origin, and evolution; Inmon v. Kimball -- are they really different?; the basics of "cubes" v. star schemas, fact v. dimension tables (are they normalized?), snowflake; warehouse v. data marts; processing demands and new approaches for a data warehouse management system (e.g., star joins); architecting and managing the data warehouse environment.

TIME: could be one hour to a whole day.

WHERE: University of Wisconsin, Eau Claire, 20th annual information systems conference, 2005.

ABSTRACT

Most computer-based information systems primarily support the operational activities within an organization through transaction processing. Management support usually consists of query and reports based on the transaction data and perhaps a feedback loop to support the planning and control cycle with exception reports. Historically, we have done relatively little to provide substantial support to top management and analysts. They need the ability to look at the data aggregated across a broad time horizon to spot trends and patterns, and then be able to drill down into the detail to dissect and diagnose a situation.

Data warehousing holds out the promise of satisfying management and analyst’s needs for information. The evidence is in the emergence and rapid growth of data warehouse tool vendors, data warehousing consultants, books, and particularly the data warehousing initiatives in organizations today -- everyone seems to be doing something. When management gets a taste of the potential, they are willing to allocate substantial resources to develop data warehouses and the requisite infrastructure. It certainly appears to be more that just another fad. The IS community is finally beginning to satisfy the “other half” of their charter.

In this talk we will:

* review the evolution of Information Systems and how this is leading into data warehousing.

* explore the significant differences from operational/transaction processing that demand a substantially different approach to deploying information technology,

* describe the essential idea behind a data warehouse, which is NOT just a massive storehouse of data, as the name might suggest.

* contrast the relational view of data with the dimensional view of data, and how a dimensional view can be cast in a relational database as a STAR schema (and thus implemented using conventional DBMS tools).

* present the overall data warehousing environment, and the variations in its components, such as data marts, the atomic data warehouse, OLAP/ROLAP/MOLAP... tools, and deployment on the web.

* briefly outline some of the biggest problems in setting up and maintaining a data warehouse -- dealing with “dirty” and inconsistent data, tracking the “metadata,” and setting up an infrastructure to support an ongoing production environment.