1. What is the flow of loading data into fact & dimensional tables?
A) Fact table - Table with Collection of Foreign Keys corresponding to the Primary
Keys in Dimensional table. Consists of fields with numeric values.
Dimension table - Table with Unique Primary Key.
Load - Data should be first loaded into dimensional table. Based on the primary key
values in dimensional table, the data should be loaded into Fact table.
2. What is the default cache size? How do you change the cache size if needed?
A. Default cache size is 256 MB. We can increase it by going into Datastage
Administrator and selecting the Tunable Tab and specify the cache size over there.
3. What are types of Hashed File?
A) Hashed File is classified broadly into 2 types.
a) Static - Sub divided into 17 types based on Primary Key Pattern.
b) Dynamic - sub divided into 2 types
i) Generic ii) Specific.
Dynamic files do not perform as well as a well, designed static file, but do perform better
than a badly designed one. When creating a dynamic file you can specify the following
Although all of these have default values)
By Default Hashed file is "Dynamic - Type Random 30 D"
4. What does a Config File in parallel extender consist of?
A) Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.
5. What is Modulus and Splitting in Dynamic Hashed File?
A. In a Hashed File, the size of the file keeps changing randomly.
If the size of the file increases it is called as "Modulus".
If the size of the file decreases it is called as "Splitting".
6. What are Stage Variables, Derivations and Constants?
A. Stage Variable - An intermediate processing variable that retains value during read
and doesn’t pass the value into target column.
Derivation - Expression that specifies value to be passed on to the target column.
Constant - Conditions that are either true or false that specifies flow of data with a link.
7. Types of views in Datastage Director?
There are 3 types of views in Datastage Director
a) Job View - Dates of Jobs Compiled.
b) Log View - Status of Job last run
c) Status View - Warning Messages, Event Messages, Program Generated Messages.
8. Types of Parallel Processing?
A) Parallel Processing is broadly classified into 2 types.
a) SMP - Symmetrical Multi Processing.
b) MPP - Massive Parallel Processing.
9. Orchestrate Vs Datastage Parallel Extender?
A) Orchestrate itself is an ETL tool with extensive parallel processing capabilities and
running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version
of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased
Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0
i.e Parallel Extender.
10. Importance of Surrogate Key in Data warehousing?
A) Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is
it is independent of underlying database. i.e. Surrogate Key is not affected by the changes
going on with a database.
11. How to run a Shell Script within the scope of a Data stage job?
A) By using "ExcecSH" command at Before/After job properties.
12. How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to
A) We use a) "Iconv" function - Internal Conversion.
b) "Oconv" function - External Conversion.
Function to convert mm/dd/yyyy format to yyyy-dd-mm is
13 How do you execute datastage job from command line prompt?
A) Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname
14. Functionality of Link Partitioner and Link Collector?
Link Partitioner: It actually splits data into various partitions or data flows using
various partition methods.
Link Collector: It collects the data coming from partitions, merges it into a single data
flow and loads to target.
15. Types of Dimensional Modeling?
A) Dimensional modeling is again sub divided into 2 types.
a) Star Schema - Simple & Much Faster. Denormalized form.
b) Snowflake Schema - Complex with more Granularity. More normalized form.
16. Differentiate Primary Key and Partition Key?
☻Page 4 of 210☻
Primary Key is a combination of unique and not null. It can be a collection of key values
called as composite primary key. Partition Key is a just a part of Primary Key. There are
several methods of partition like Hash, DB2, and Random etc. While using Hash partition
we specify the Partition Key.
17. Differentiate Database data and Data warehouse data?
A) Data in a Database is
a) Detailed or Transactional
b) Both Readable and Writable.
18. Containers Usage and Types?
Container is a collection of stages used for the purpose of Reusability.
There are 2 types of Containers.
a) Local Container: Job Specific
b) Shared Container: Used in any job within a project.
19. Compare and Contrast ODBC and Plug-In stages?
ODBC: a) Poor Performance.
b) Can be used for Variety of Databases.
c) Can handle Stored Procedures.
Plug-In: a) Good Performance.
b) Database specific. (Only one database)
c) Cannot handle Stored Procedures.
20. Dimension Modelling types along with their significance
Data Modelling is Broadly classified into 2 types.
a) E-R Diagrams (Entity - Relatioships).
b) Dimensional Modelling.
Q 21 What are Ascential Dastastage Products, Connectivity
Ascential DataStage EE (3)
Ascential DataStage EE MVS
Ascential DataStage TX
Ascential RTI (2)
Ascential Commerce Manager
Q 22 Explain Data Stage Architecture?
Data Stage contains two components,
_ Data Stage Administrator.
_ Data Stage Manager
_ Data Stage Designer
_ Data Stage Director
_ Data Stage Engine
_ Meta Data Repository
_ Package Installer
Data Stage Administrator:
Used to create the project.
Contains set of properties
We can set the buffer size (by default 128 MB)
We can increase the buffer size.
We can set the Environment Variables.
In tunable we have in process and inter-process
In-process—Data read in sequentially
Inter-process— It reads the data as it comes.
It just interfaces to metadata.
Data Stage Manager:
We can view and edit the Meta data Repository.
We can import table definitions.
We can export the Data stage components in .xml or .dsx format.
We can create routines and transforms
We can compile the multiple jobs.
Data Stage Designer:
We can create the jobs. We can compile the job. We can run the job. We can
declare stage variable in transform, we can call routines, transform, macros, functions.
We can write constraints.
Data Stage Director:
We can run the jobs.
We can schedule the jobs. (Schedule can be done daily, weekly, monthly, quarterly)
We can monitor the jobs.
We can release the jobs.
Q 23 What is Meta Data Repository?
Meta Data is a data about the data.
It also contains
_ Query statistics
_ ETL statistics
_ Business subject area
_ Source Information
_ Target Information
_ Source to Target mapping Information.
Q 24 What is Data Stage Engine?
It is a JAVA engine running at the background.
Q 25 What is Dimensional Modeling?
Dimensional Modeling is a logical design technique that seeks to present the data
in a standard framework that is, intuitive and allows for high performance access.
Q 26 What is Star Schema?
Star Schema is a de-normalized multi-dimensional model. It contains centralized fact
tables surrounded by dimensions table.
Dimension Table: It contains a primary key and description about the fact table.
Fact Table: It contains foreign keys to the dimension tables, measures and aggregates.
Q 27 What is surrogate Key?
It is a 4-byte integer which replaces the transaction / business / OLTP key in the
dimension table. We can store up to 2 billion record.
Q 28 Why we need surrogate key?
It is used for integrating the data may help better for primary key.
Index maintenance, joins, table size, key updates, disconnected inserts and
Q 29 What is Snowflake schema?
It is partially normalized dimensional model in which at two represents least one
dimension or more hierarchy related tables.
Q 30 Explain Types of Fact Tables?
Factless Fact: It contains only foreign keys to the dimension tables.
Additive Fact: Measures can be added across any dimensions.
Semi-Additive: Measures can be added across some dimensions. Eg, % age, discount
Non-Additive: Measures cannot be added across any dimensions. Eg, Average
☻Page 8 of 210☻
Conformed Fact: The equation or the measures of the two fact tables are the same under
the facts are measured across the dimensions with a same set of measures.
Q 31 Explain the Types of Dimension Tables?
Conformed Dimension: If a dimension table is connected to more than one fact table,
the granularity that is defined in the dimension table is common across between the fact
Junk Dimension: The Dimension table, which contains only flags.
Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension.
De-generative Dimension: It is line item-oriented fact table design.
Q 32 What are stage variables?
Stage variables are declaratives in Transformer Stage used to store values. Stage
variables are active at the run time. (Because memory is allocated at the run time).
Q 33 What is sequencer?
It sets the sequence of execution of server jobs.
Q 34 What are Active and Passive stages?
Active Stage: Active stage model the flow of data and provide mechanisms for
combining data streams, aggregating data and converting data from one data type to
another. Eg, Transformer, aggregator, sort, Row Merger etc.
Passive Stage: A Passive stage handles access to Database for the extraction or writing
of data. Eg, IPC stage, File types, Universe, Unidata, DRS stage etc.
Q 35 What is ODS?
Operational Data Store is a staging area where data can be rolled back.
Q 36 What are Macros?
They are built from Data Stage functions and do not require arguments.
A number of macros are provided in the JOBCONTROL.H file to facilitate getting
information about the current job, and links and stages belonging to the current job.
These can be used in expressions (for example for use in Transformer stages), job control
routines, filenames and table names, and before/after subroutines.
These macros provide the functionality of using the DSGetProjectInfo, DSGetJobInfo,
DSGetStageInfo, and DSGetLinkInfo functions with the DSJ.ME token as the JobHandle
and can be used in all active stages and before/after subroutines. The macros provide the
functionality for all the possible InfoType arguments for the DSGet…Info functions. See
the Function call help topics for more details.
The available macros are:
☻Page 9 of 210☻
2) To obtain the name of the current job:
3) MyName = DSJobName
To obtain the full current stage name:
MyName = DSJobName : ″.″ : DSStageName
Q 37 What is keyMgtGetNextValue?
It is a Built-in transform it generates Sequential numbers. Its input type is literal string &
output type is string.
Q 38 What are stages?
The stages are either passive or active stages.
Passive stages handle access to databases for extracting or writing data.
Active stages model the flow of data and provide mechanisms for combining data
streams, aggregating data, and converting data from one data type to another.
Q 39 What index is created on Data Warehouse?
Bitmap index is created in Data Warehouse.
Q 40 What is container?
A container is a group of stages and links. Containers enable you to simplify and
modularize your server job designs by replacing complex areas of the diagram with a
single container stage. You can also use shared containers as a way of incorporating
server job functionality into parallel jobs.
DataStage provides two types of container:
Local containers. These are created within a job and are only accessible by that
job. A local container is edited in a tabbed page of the job’s Diagram window.
Shared containers. These are created separately and are stored in the Repository
in the same way that jobs are. There are two types of shared container
Q 41 What is function? ( Job Control – Examples of Transform Functions )
Functions take arguments and return a value.
_ BASIC functions: A function performs mathematical or string manipulations on
the arguments supplied to it, and return a value. Some functions have 0
arguments; most have 1 or more. Arguments are always in parentheses, separated
by commas, as shown in this general syntax:
FunctionName (argument, argument)
_ DataStage BASIC functions: These functions can be used in a job control
routine, which is defined as part of a job’s properties and allows other jobs to be
run and controlled from the first job. Some of the functions can also be used for
getting status information on the current job; these are useful in active stage
expressions and before- and after-stage subroutines.
To do this ... Use this function ...
Specify the job you want to control DSAttachJob
Set parameters for the job you want to control DSSetParam
Set limits for the job you want to control DSSetJobLimit
Request that a job is run DSRunJob
Wait for a called job to finish DSWaitForJob
Gets the meta data details for the specified link DSGetLinkMetaData
Get information about the current project DSGetProjectInfo
Get buffer size and timeout value for an IPC or Web Service
Get information about the controlled job or current job DSGetJobInfo
Get information about the meta bag properties associated with
the named job
Get information about a stage in the controlled job or current
Get the names of the links attached to the specified stage DSGetStageLinks
Get a list of stages of a particular type in a job. DSGetStagesOfType
Get information about the types of stage in a job. DSGetStageTypes
Get information about a link in a controlled job or current job DSGetLinkInfo
Get information about a controlled job’s parameters DSGetParamInfo
Get the log event from the job log DSGetLogEntry
Get a number of log events on the specified subject from the
Get the newest log event, of a specified type, from the job log DSGetNewestLogId
Log an event to the job log of a different job DSLogEvent
Stop a controlled job DSStopJob
Return a job handle previously obtained from DSAttachJob DSDetachJob
Log a fatal error message in a job's log file and aborts the job. DSLogFatal
Log an information message in a job's log file. DSLogInfo
Put an info message in the job log of a job controlling current
Log a warning message in a job's log file. DSLogWarn
Generate a string describing the complete status of a valid
Insert arguments into the message template. DSMakeMsg
Ensure a job is in the correct state to be run or validated. DSPrepareJob
Interface to system send mail facility. DSSendMail
Log a warning message to a job log file. DSTransformError
Convert a job control status or error code into an explanatory
Suspend a job until a named file either exists or does not exist. DSWaitForFile
Checks if a BASIC routine is cataloged, either in VOC as a
callable item, or in the catalog space.
Execute a DOS or Data Stage Engine command from a
Set a status message for a job to return as a termination
message when it finishes
Q 42 What is Routines?
Routines are stored in the Routines branch of the Data Stage Repository, where you can
create, view or edit. The following programming components are classified as routines:
Transform functions, Before/After subroutines, Custom UniVerse functions, ActiveX
(OLE) functions, Web Service routines
Q 43 What is data stage Transform?
Q 44 What is Meta Brokers?
Q 45 What is usage analysis?
Q 46 What is job sequencer?
☻Page 12 of 210☻
Q 47 What are different activities in job sequencer?
Q 48 What are triggers in data Stages? (conditional, unconditional, otherwise)
Q 49 Are u generated job Reports? S
Q 50 What is plug-in?
Q 51 Have u created any custom transform? Explain? (Oconv)