What is a Data Warehouse?
Post date: Jun 16, 2013 11:57:16 AM
What is a Data Warehouse?
A data warehouse is a centralized data store collating data, from multiple sources, making it easier to share data across multiple platforms. Data from various data stores can be pulled or pushed, reports created for rich media presentations, websites, also manages big data.
Figure 1
DATA: these points can be: sql server, sas enterprise miner, oracle, and spread sheet data
What
Typically, a relational database designed for query and analysis, rather than transactional queries.
A place where historical data is stored, for archival purposes, analysis and security.
Contains raw or formatted data
Combines data from multiple sources.
Why
Data is stored in various types of format and in different locations.
Data in various sources is not easily accessed such as operational databases (OLTP).
Online transactional processing systems are and not designed for analysing data, this data is also often live data which is constantly updating and changing.
Data is often stored in different platforms and different locations.
DATA
Subject orientated –Decision based
Integrated – various platforms
Non-volatile – not changing
Time variant – showing changes over time
BIG DATA
Defined as; too much volume, velocity, variability, potentially 5 petabyte or more, to work on normal database architectures. Such as Google, Facebook, twitter etc.
Testing / developing a data warehouse
Business analysts ------ get requirements
Q-A tester ----------- develops and executes test cases.
Architects ----------- set up test environment
Developers ------------- reform test units
Database administrators ------- reform functionality (user acceptance tests)
Testing Strategy
Sources
ETL layer
The data warehouse
The business intelligence front end (dashboards) (Scorecard)
Test all entry point
Testing the dashboard
Logical calculations – compare data M art to report
Data validation –create equivalent sql queries from dashboard and data mart.
Functional TESTING.
Make sure create reports based on user requirements
Verify Field lengths data elements
Use automation tools
Test speed + performance + workloads, workflows and wait time.
EXTRACT – TRANSFORM LOAD
WHY
The data ware house needs to load data from various sources, the data EXTRACTED needs to be TRANSFORMED into a format which can be used by the data warehouse for analysis.
EXTRACT: from OLTP SYSTEM and copied to the ware house
Transform: remove inconsistences, format to a common format, add missing fields derive new fields to store calculated data.
Load: Map data into the warehouse
Mapping document:
Source to target map: required to plan the etl process.
Capture business rules
Data flow mapping
Data movement requirements
Mapping document specification:
SOURCE: Input Definition
TARGET: output formats
Business and data transformation rules
Data quality requirements
Source & Target SQL is available here also.