DATASTAGE QUESTIONS

1. What is the flow of loading data into fact & dimensional tables?

A) Fact table - Table with Collection of Foreign Keys corresponding to the Primary

Keys in Dimensional table. Consists of fields with numeric values.

Dimension table - Table with Unique Primary Key.

Load - Data should be first loaded into dimensional table. Based on the primary key

values in dimensional table, the data should be loaded into Fact table.

2. What is the default cache size? How do you change the cache size if needed?

A. Default cache size is 256 MB. We can increase it by going into Datastage

Administrator and selecting the Tunable Tab and specify the cache size over there.

3. What are types of Hashed File?

A) Hashed File is classified broadly into 2 types.

a) Static - Sub divided into 17 types based on Primary Key Pattern.

b) Dynamic - sub divided into 2 types

i) Generic ii) Specific.

Dynamic files do not perform as well as a well, designed static file, but do perform better

than a badly designed one. When creating a dynamic file you can specify the following

Although all of these have default values)

By Default Hashed file is "Dynamic - Type Random 30 D"

4. What does a Config File in parallel extender consist of?

A) Config file consists of the following.

a) Number of Processes or Nodes.

b) Actual Disk Storage Location.

5. What is Modulus and Splitting in Dynamic Hashed File?

A. In a Hashed File, the size of the file keeps changing randomly.

If the size of the file increases it is called as "Modulus".

If the size of the file decreases it is called as "Splitting".

6. What are Stage Variables, Derivations and Constants?

A. Stage Variable - An intermediate processing variable that retains value during read

and doesn’t pass the value into target column.

Derivation - Expression that specifies value to be passed on to the target column.

Constant - Conditions that are either true or false that specifies flow of data with a link.

7. Types of views in Datastage Director?

There are 3 types of views in Datastage Director

a) Job View - Dates of Jobs Compiled.

b) Log View - Status of Job last run

c) Status View - Warning Messages, Event Messages, Program Generated Messages.

8. Types of Parallel Processing?

A) Parallel Processing is broadly classified into 2 types.

a) SMP - Symmetrical Multi Processing.

b) MPP - Massive Parallel Processing.

9. Orchestrate Vs Datastage Parallel Extender?

A) Orchestrate itself is an ETL tool with extensive parallel processing capabilities and

running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version

of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased

Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0

i.e Parallel Extender.

10. Importance of Surrogate Key in Data warehousing?

A) Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is

it is independent of underlying database. i.e. Surrogate Key is not affected by the changes

going on with a database.

11. How to run a Shell Script within the scope of a Data stage job?

A) By using "ExcecSH" command at Before/After job properties.

12. How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to

yyyy-dd-mm?

A) We use a) "Iconv" function - Internal Conversion.

b) "Oconv" function - External Conversion.

Function to convert mm/dd/yyyy format to yyyy-dd-mm is

Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")

13 How do you execute datastage job from command line prompt?

A) Using "dsjob" command as follows.

dsjob -run -jobstatus projectname jobname

14. Functionality of Link Partitioner and Link Collector?

Link Partitioner: It actually splits data into various partitions or data flows using

various partition methods.

Link Collector: It collects the data coming from partitions, merges it into a single data

flow and loads to target.

15. Types of Dimensional Modeling?

A) Dimensional modeling is again sub divided into 2 types.

a) Star Schema - Simple & Much Faster. Denormalized form.

b) Snowflake Schema - Complex with more Granularity. More normalized form.

16. Differentiate Primary Key and Partition Key?

☻Page 4 of 210☻

Primary Key is a combination of unique and not null. It can be a collection of key values

called as composite primary key. Partition Key is a just a part of Primary Key. There are

several methods of partition like Hash, DB2, and Random etc. While using Hash partition

we specify the Partition Key.

17. Differentiate Database data and Data warehouse data?

A) Data in a Database is

a) Detailed or Transactional

b) Both Readable and Writable.

c) Current.

18. Containers Usage and Types?

Container is a collection of stages used for the purpose of Reusability.

There are 2 types of Containers.

a) Local Container: Job Specific

b) Shared Container: Used in any job within a project.

19. Compare and Contrast ODBC and Plug-In stages?

ODBC: a) Poor Performance.

b) Can be used for Variety of Databases.

c) Can handle Stored Procedures.

Plug-In: a) Good Performance.

b) Database specific. (Only one database)

c) Cannot handle Stored Procedures.

20. Dimension Modelling types along with their significance

Data Modelling is Broadly classified into 2 types.

a) E-R Diagrams (Entity - Relatioships).

b) Dimensional Modelling.

Q 21 What are Ascential Dastastage Products, Connectivity

Ans:

Ascential Products

Ascential DataStage

Ascential DataStage EE (3)

Ascential DataStage EE MVS

Ascential DataStage TX

Ascential QualityStage

Ascential MetaStage

Ascential RTI (2)

Ascential ProfileStage

Ascential AuditStage

Ascential Commerce Manager

Industry Solutions

Connectivity

Files

RDBMS

Real-time

PACKs

EDI

Other

Q 22 Explain Data Stage Architecture?

Data Stage contains two components,

Client Component.

Server Component.

Client Component:

_ Data Stage Administrator.

_ Data Stage Manager

_ Data Stage Designer

_ Data Stage Director

Server Components:

_ Data Stage Engine

_ Meta Data Repository

_ Package Installer

Data Stage Administrator:

Used to create the project.

Contains set of properties

We can set the buffer size (by default 128 MB)

We can increase the buffer size.

We can set the Environment Variables.

In tunable we have in process and inter-process

In-process—Data read in sequentially

Inter-process— It reads the data as it comes.

It just interfaces to metadata.

Data Stage Manager:

We can view and edit the Meta data Repository.

We can import table definitions.

We can export the Data stage components in .xml or .dsx format.

We can create routines and transforms

We can compile the multiple jobs.

Data Stage Designer:

We can create the jobs. We can compile the job. We can run the job. We can

declare stage variable in transform, we can call routines, transform, macros, functions.

We can write constraints.

Data Stage Director:

We can run the jobs.

We can schedule the jobs. (Schedule can be done daily, weekly, monthly, quarterly)

We can monitor the jobs.

We can release the jobs.

Q 23 What is Meta Data Repository?

Meta Data is a data about the data.

It also contains

_ Query statistics

_ ETL statistics

_ Business subject area

_ Source Information

_ Target Information

_ Source to Target mapping Information.

Q 24 What is Data Stage Engine?

It is a JAVA engine running at the background.

Q 25 What is Dimensional Modeling?

Dimensional Modeling is a logical design technique that seeks to present the data

in a standard framework that is, intuitive and allows for high performance access.

Q 26 What is Star Schema?

Star Schema is a de-normalized multi-dimensional model. It contains centralized fact

tables surrounded by dimensions table.

Dimension Table: It contains a primary key and description about the fact table.

Fact Table: It contains foreign keys to the dimension tables, measures and aggregates.

Q 27 What is surrogate Key?

It is a 4-byte integer which replaces the transaction / business / OLTP key in the

dimension table. We can store up to 2 billion record.

Q 28 Why we need surrogate key?

It is used for integrating the data may help better for primary key.

Index maintenance, joins, table size, key updates, disconnected inserts and

partitioning.

Q 29 What is Snowflake schema?

It is partially normalized dimensional model in which at two represents least one

dimension or more hierarchy related tables.

Q 30 Explain Types of Fact Tables?

Factless Fact: It contains only foreign keys to the dimension tables.

Additive Fact: Measures can be added across any dimensions.

Semi-Additive: Measures can be added across some dimensions. Eg, % age, discount

Non-Additive: Measures cannot be added across any dimensions. Eg, Average

☻Page 8 of 210☻

Conformed Fact: The equation or the measures of the two fact tables are the same under

the facts are measured across the dimensions with a same set of measures.

Q 31 Explain the Types of Dimension Tables?

Conformed Dimension: If a dimension table is connected to more than one fact table,

the granularity that is defined in the dimension table is common across between the fact

tables.

Junk Dimension: The Dimension table, which contains only flags.

Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension.

De-generative Dimension: It is line item-oriented fact table design.

Q 32 What are stage variables?

Stage variables are declaratives in Transformer Stage used to store values. Stage

variables are active at the run time. (Because memory is allocated at the run time).

Q 33 What is sequencer?

It sets the sequence of execution of server jobs.

Q 34 What are Active and Passive stages?

Active Stage: Active stage model the flow of data and provide mechanisms for

combining data streams, aggregating data and converting data from one data type to

another. Eg, Transformer, aggregator, sort, Row Merger etc.

Passive Stage: A Passive stage handles access to Database for the extraction or writing

of data. Eg, IPC stage, File types, Universe, Unidata, DRS stage etc.

Q 35 What is ODS?

Operational Data Store is a staging area where data can be rolled back.

Q 36 What are Macros?

They are built from Data Stage functions and do not require arguments.

A number of macros are provided in the JOBCONTROL.H file to facilitate getting

information about the current job, and links and stages belonging to the current job.

These can be used in expressions (for example for use in Transformer stages), job control

routines, filenames and table names, and before/after subroutines.

These macros provide the functionality of using the DSGetProjectInfo, DSGetJobInfo,

DSGetStageInfo, and DSGetLinkInfo functions with the DSJ.ME token as the JobHandle

and can be used in all active stages and before/after subroutines. The macros provide the

functionality for all the possible InfoType arguments for the DSGet…Info functions. See

the Function call help topics for more details.

The available macros are:

DSHostName

DSProjectName

DSJobStatus

DSJobName

☻Page 9 of 210☻

DSJobController

DSJobStartDate

DSJobStartTime

DSJobStartTimestamp

DSJobWaveNo

DSJobInvocations

DSJobInvocationId

DSStageName

DSStageLastErr

DSStageType

DSStageInRowNum

DSStageVarList

DSLinkRowCount

DSLinkLastErr

DSLinkName

1) Examples

2) To obtain the name of the current job:

3) MyName = DSJobName

To obtain the full current stage name:

MyName = DSJobName : .: DSStageName

Q 37 What is keyMgtGetNextValue?

It is a Built-in transform it generates Sequential numbers. Its input type is literal string &

output type is string.

Q 38 What are stages?

The stages are either passive or active stages.

Passive stages handle access to databases for extracting or writing data.

Active stages model the flow of data and provide mechanisms for combining data

streams, aggregating data, and converting data from one data type to another.

Q 39 What index is created on Data Warehouse?

Bitmap index is created in Data Warehouse.

Q 40 What is container?

A container is a group of stages and links. Containers enable you to simplify and

modularize your server job designs by replacing complex areas of the diagram with a

single container stage. You can also use shared containers as a way of incorporating

server job functionality into parallel jobs.

DataStage provides two types of container:

• Local containers. These are created within a job and are only accessible by that

job. A local container is edited in a tabbed page of the job’s Diagram window.

• Shared containers. These are created separately and are stored in the Repository

in the same way that jobs are. There are two types of shared container

Q 41 What is function? ( Job Control – Examples of Transform Functions )

Functions take arguments and return a value.

_ BASIC functions: A function performs mathematical or string manipulations on

the arguments supplied to it, and return a value. Some functions have 0

arguments; most have 1 or more. Arguments are always in parentheses, separated

by commas, as shown in this general syntax:

FunctionName (argument, argument)

_ DataStage BASIC functions: These functions can be used in a job control

routine, which is defined as part of a job’s properties and allows other jobs to be

run and controlled from the first job. Some of the functions can also be used for

getting status information on the current job; these are useful in active stage

expressions and before- and after-stage subroutines.

To do this ... Use this function ...

Specify the job you want to control DSAttachJob

Set parameters for the job you want to control DSSetParam

Set limits for the job you want to control DSSetJobLimit

Request that a job is run DSRunJob

Wait for a called job to finish DSWaitForJob

Gets the meta data details for the specified link DSGetLinkMetaData

Get information about the current project DSGetProjectInfo

Get buffer size and timeout value for an IPC or Web Service

stage

DSGetIPCStageProps

Get information about the controlled job or current job DSGetJobInfo

Get information about the meta bag properties associated with

the named job

DSGetJobMetaBag

Get information about a stage in the controlled job or current

job

DSGetStageInfo

Get the names of the links attached to the specified stage DSGetStageLinks

Get a list of stages of a particular type in a job. DSGetStagesOfType

Get information about the types of stage in a job. DSGetStageTypes

Get information about a link in a controlled job or current job DSGetLinkInfo

Get information about a controlled job’s parameters DSGetParamInfo

Get the log event from the job log DSGetLogEntry

Get a number of log events on the specified subject from the

job log

DSGetLogSummary

Get the newest log event, of a specified type, from the job log DSGetNewestLogId

Log an event to the job log of a different job DSLogEvent

Stop a controlled job DSStopJob

Return a job handle previously obtained from DSAttachJob DSDetachJob

Log a fatal error message in a job's log file and aborts the job. DSLogFatal

Log an information message in a job's log file. DSLogInfo

Put an info message in the job log of a job controlling current

job.

DSLogToController

Log a warning message in a job's log file. DSLogWarn

Generate a string describing the complete status of a valid

attached job.

DSMakeJobReport

Insert arguments into the message template. DSMakeMsg

Ensure a job is in the correct state to be run or validated. DSPrepareJob

Interface to system send mail facility. DSSendMail

Log a warning message to a job log file. DSTransformError

Convert a job control status or error code into an explanatory

text message.

DSTranslateCode

Suspend a job until a named file either exists or does not exist. DSWaitForFile

Checks if a BASIC routine is cataloged, either in VOC as a

callable item, or in the catalog space.

DSCheckRoutine

Execute a DOS or Data Stage Engine command from a

before/after subroutine.

DSExecute

Set a status message for a job to return as a termination

message when it finishes

DSSetUserStatus

Q 42 What is Routines?

Routines are stored in the Routines branch of the Data Stage Repository, where you can

create, view or edit. The following programming components are classified as routines:

Transform functions, Before/After subroutines, Custom UniVerse functions, ActiveX

(OLE) functions, Web Service routines

Q 43 What is data stage Transform?

Q 44 What is Meta Brokers?

Q 45 What is usage analysis?

Q 46 What is job sequencer?

☻Page 12 of 210☻

Q 47 What are different activities in job sequencer?

Q 48 What are triggers in data Stages? (conditional, unconditional, otherwise)

Q 49 Are u generated job Reports? S

Q 50 What is plug-in?

Q 51 Have u created any custom transform? Explain? (Oconv)