DATASTAGE QUESTIONS 1.
What is the flow of load A) Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys in Dimensional table. Consists of fields with numeric values. Dimension table - Table with Unique Primary Key. Load - Data should be first loaded into dimensional table. Based on the primary key values in dimensional table, the data should be loaded into Fact table. 2. What is the default cache size? How do you change the cache size if needed? A. Default cache size is 256 MB. We can increase it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there. 3. What are types of Hashed File? A) Hashed File is classified broadly into 2 types. a) Static - Sub divided into 17 types based on Primary Key Pattern. b) Dynamic - sub divided into 2 types i) Generic ii) Specific. Dynamic files do not perform as well as a well, designed static file, but do perform better than a badly designed one. When creating a dynamic file you can specify the following Although all of these have default values) By Default Hashed file is "Dynamic - Type Random 30 D" 4. What does a Config File in parallel extender consist of? A) Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk Storage Location. 5. What is Modulus and Splitting in Dynamic Hashed File? A. In a Hashed File, the size of the file keeps changing randomly. If the size of the file increases it is called as "Modulus". If the size of the file decreases it is called as "Splitting". 6. What are Stage Variables, Derivations and Constants? A. Stage Variable - An intermediate processing variable that retains value during read and doesn’t pass the value into target column. Derivation - Expression that specifies value to be passed on to the target column. Constant - Conditions that are either true or false that specifies flow of data with a link. 7. Types of views in Datastage Director? There are 3 types of views in Datastage Director a) Job View - Dates of Jobs Compiled. b) Log View - Status of Job last run c) Status View - Warning Messages, Event Messages, Program Generated Messages. 8. Types of Parallel Processing? A) Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing. 9. Orchestrate Vs Datastage Parallel Extender? A) Orchestrate itself is an ETL tool with extensive parallel processing capabilities and running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0 i.e Parallel Extender. 10. Importance of Surrogate Key in Data warehousing? A) Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e. Surrogate Key is not affected by the changes going on with a database. 11. How to run a Shell Script within the scope of a Data stage job? A) By using "ExcecSH" command at Before/After job properties. 12. How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to yyyy-dd-mm? A) We use a) "Iconv" function - Internal Conversion. b) "Oconv" function - External Conversion. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]") 13 How do you execute datastage job from command line prompt? A) Using "dsjob" command as follows. dsjob -run -jobstatus projectname jobname 14. Functionality of Link Partitioner and Link Collector? Link Partitioner: It actually splits data into various partitions or data flows using various partition methods. Link Collector: It collects the data coming from partitions, merges it into a single data flow and loads to target. 15. Types of Dimensional Modeling? A) Dimensional modeling is again sub divided into 2 types. a) Star Schema - Simple & Much Faster. Denormalized form. b) Snowflake Schema - Complex with more Granularity. More normalized form. 16. Differentiate Primary Key and Partition Key? ☻Page 4 of 210☻ Primary Key is a combination of unique and not null. It can be a collection of key values called as composite primary key. Partition Key is a just a part of Primary Key. There are several methods of partition like Hash, DB2, and Random etc. While using Hash partition we specify the Partition Key. 17. Differentiate Database data and Data warehouse data? A) Data in a Database is a) Detailed or Transactional b) Both Readable and Writable. c) Current. 18. Containers Usage and Types? Container is a collection of stages used for the purpose of Reusability. There are 2 types of Containers. a) Local Container: Job Specific b) Shared Container: Used in any job within a project. 19. Compare and Contrast ODBC and Plug-In stages? ODBC: a) Poor Performance. b) Can be used for Variety of Databases. c) Can handle Stored Procedures. Plug-In: a) Good Performance. b) Database specific. (Only one database) c) Cannot handle Stored Procedures. 20. Dimension Modelling types along with their significance Data Modelling is Broadly classified into 2 types. a) E-R Diagrams (Entity - Relatioships). b) Dimensional Modelling. Q 21 What are Ascential Dastastage Products, Connectivity Ans: Ascential Products Ascential DataStage Ascential DataStage EE (3) Ascential DataStage EE MVS Ascential DataStage TX Ascential QualityStage Ascential MetaStage Ascential RTI (2) Ascential ProfileStage Ascential AuditStage Ascential Commerce Manager Industry Solutions Connectivity Files RDBMS Real-time PACKs EDI Other Q 22 Explain Data Stage Architecture? Data Stage contains two components, Client Component. Server Component. Client Component: _ Data Stage Administrator. _ Data Stage Manager _ Data Stage Designer _ Data Stage Director Server Components: _ Data Stage Engine _ Meta Data Repository _ Package Installer Data Stage Administrator: Used to create the project. Contains set of properties We can set the buffer size (by default 128 MB) We can increase the buffer size. We can set the Environment Variables. In tunable we have in process and inter-process In-process—Data read in sequentially Inter-process— It reads the data as it comes. It just interfaces to metadata. Data Stage Manager: We can view and edit the Meta data Repository. We can import table definitions. We can export the Data stage components in .xml or .dsx format. We can create routines and transforms We can compile the multiple jobs. Data Stage Designer: We can create the jobs. We can compile the job. We can run the job. We can declare stage variable in transform, we can call routines, transform, macros, functions. We can write constraints. Data Stage Director: We can run the jobs. We can schedule the jobs. (Schedule can be done daily, weekly, monthly, quarterly) We can monitor the jobs. We can release the jobs. Q 23 What is Meta Data Repository? Meta Data is a data about the data. It also contains _ Query statistics _ ETL statistics _ Business subject area _ Source Information _ Target Information _ Source to Target mapping Information. Q 24 What is Data Stage Engine? It is a JAVA engine running at the background. Q 25 What is Dimensional Modeling? Dimensional Modeling is a logical design technique that seeks to present the data in a standard framework that is, intuitive and allows for high performance access. Q 26 What is Star Schema? Star Schema is a de-normalized multi-dimensional model. It contains centralized fact tables surrounded by dimensions table. Dimension Table: It contains a primary key and description about the fact table. Fact Table: It contains foreign keys to the dimension tables, measures and aggregates. Q 27 What is surrogate Key? It is a 4-byte integer which replaces the transaction / business / OLTP key in the dimension table. We can store up to 2 billion record. Q 28 Why we need surrogate key? It is used for integrating the data may help better for primary key. Index maintenance, joins, table size, key updates, disconnected inserts and partitioning. Q 29 What is Snowflake schema? It is partially normalized dimensional model in which at two represents least one dimension or more hierarchy related tables. Q 30 Explain Types of Fact Tables? Factless Fact: It contains only foreign keys to the dimension tables. Additive Fact: Measures can be added across any dimensions. Semi-Additive: Measures can be added across some dimensions. Eg, % age, discount Non-Additive: Measures cannot be added across any dimensions. Eg, Average ☻Page 8 of 210☻ Conformed Fact: The equation or the measures of the two fact tables are the same under the facts are measured across the dimensions with a same set of measures. Q 31 Explain the Types of Dimension Tables? Conformed Dimension: If a dimension table is connected to more than one fact table, the granularity that is defined in the dimension table is common across between the fact tables. Junk Dimension: The Dimension table, which contains only flags. Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension. De-generative Dimension: It is line item-oriented fact table design. Q 32 What are stage variables? Stage variables are declaratives in Transformer Stage used to store values. Stage variables are active at the run time. (Because memory is allocated at the run time). Q 33 What is sequencer? It sets the sequence of execution of server jobs. Q 34 What are Active and Passive stages? Active Stage: Active stage model the flow of data and provide mechanisms for combining data streams, aggregating data and converting data from one data type to another. Eg, Transformer, aggregator, sort, Row Merger etc. Passive Stage: A Passive stage handles access to Database for the extraction or writing of data. Eg, IPC stage, File types, Universe, Unidata, DRS stage etc. Q 35 What is ODS? Operational Data Store is a staging area where data can be rolled back. Q 36 What are Macros? They are built from Data Stage functions and do not require arguments. A number of macros are provided in the JOBCONTROL.H file to facilitate getting information about the current job, and links and stages belonging to the current job. These can be used in expressions (for example for use in Transformer stages), job control routines, filenames and table names, and before/after subroutines. These macros provide the functionality of using the DSGetProjectInfo, DSGetJobInfo, DSGetStageInfo, and DSGetLinkInfo functions with the DSJ.ME token as the JobHandle and can be used in all active stages and before/after subroutines. The macros provide the functionality for all the possible InfoType arguments for the DSGet…Info functions. See the Function call help topics for more details. The available macros are: DSHostName DSProjectName DSJobStatus DSJobName ☻Page 9 of 210☻ DSJobController DSJobStartDate DSJobStartTime DSJobStartTimestamp DSJobWaveNo DSJobInvocations DSJobInvocationId DSStageName DSStageLastErr DSStageType DSStageInRowNum DSStageVarList DSLinkRowCount DSLinkLastErr DSLinkName 1) Examples 2) To obtain the name of the current job: 3) MyName = DSJobName To obtain the full current stage name: MyName = DSJobName : ″.″ : DSStageName Q 37 What is keyMgtGetNextValue? It is a Built-in transform it generates Sequential numbers. Its input type is literal string & output type is string. Q 38 What are stages? The stages are either passive or active stages. Passive stages handle access to databases for extracting or writing data. Active stages model the flow of data and provide mechanisms for combining data streams, aggregating data, and converting data from one data type to another. Q 39 What index is created on Data Warehouse? Bitmap index is created in Data Warehouse. Q 40 What is container? A container is a group of stages and links. Containers enable you to simplify and modularize your server job designs by replacing complex areas of the diagram with a single container stage. You can also use shared containers as a way of incorporating server job functionality into parallel jobs. DataStage provides two types of container: Local containers. These are created within a job and are only accessible by that job. A local container is edited in a tabbed page of the job’s Diagram window. Shared containers. These are created separately and are stored in the Repository in the same way that jobs are. There are two types of shared container Q 41 What is function? ( Job Control – Examples of Transform Functions ) Functions take arguments and return a value. _ BASIC functions: A function performs mathematical or string manipulations on the arguments supplied to it, and return a value. Some functions have 0 arguments; most have 1 or more. Arguments are always in parentheses, separated by commas, as shown in this general syntax: FunctionName (argument, argument) _ DataStage BASIC functions: These functions can be used in a job control routine, which is defined as part of a job’s properties and allows other jobs to be run and controlled from the first job. Some of the functions can also be used for getting status information on the current job; these are useful in active stage expressions and before- and after-stage subroutines. To do this ... Use this function ... Specify the job you want to control DSAttachJob Set parameters for the job you want to control DSSetParam Set limits for the job you want to control DSSetJobLimit Request that a job is run DSRunJob Wait for a called job to finish DSWaitForJob Gets the meta data details for the specified link DSGetLinkMetaData Get information about the current project DSGetProjectInfo Get buffer size and timeout value for an IPC or Web Service stage DSGetIPCStageProps Get information about the controlled job or current job DSGetJobInfo Get information about the meta bag properties associated with the named job DSGetJobMetaBag Get information about a stage in the controlled job or current job DSGetStageInfo Get the names of the links attached to the specified stage DSGetStageLinks Get a list of stages of a particular type in a job. DSGetStagesOfType Get information about the types of stage in a job. DSGetStageTypes Get information about a link in a controlled job or current job DSGetLinkInfo Get information about a controlled job’s parameters DSGetParamInfo Get the log event from the job log DSGetLogEntry Get a number of log events on the specified subject from the job log DSGetLogSummary Get the newest log event, of a specified type, from the job log DSGetNewestLogId Log an event to the job log of a different job DSLogEvent Stop a controlled job DSStopJob Return a job handle previously obtained from DSAttachJob DSDetachJob Log a fatal error message in a job's log file and aborts the job. DSLogFatal Log an information message in a job's log file. DSLogInfo Put an info message in the job log of a job controlling current job. DSLogToController Log a warning message in a job's log file. DSLogWarn Generate a string describing the complete status of a valid attached job. DSMakeJobReport Insert arguments into the message template. DSMakeMsg Ensure a job is in the correct state to be run or validated. DSPrepareJob Interface to system send mail facility. DSSendMail Log a warning message to a job log file. DSTransformError Convert a job control status or error code into an explanatory text message. DSTranslateCode Suspend a job until a named file either exists or does not exist. DSWaitForFile Checks if a BASIC routine is cataloged, either in VOC as a callable item, or in the catalog space. DSCheckRoutine Execute a DOS or Data Stage Engine command from a before/after subroutine. DSExecute Set a status message for a job to return as a termination message when it finishes DSSetUserStatus Q 42 What is Routines? Routines are stored in the Routines branch of the Data Stage Repository, where you can create, view or edit. The following programming components are classified as routines: Transform functions, Before/After subroutines, Custom UniVerse functions, ActiveX (OLE) functions, Web Service routines Q 43 What is data stage Transform? Q 44 What is Meta Brokers? Q 45 What is usage analysis? Q 46 What is job sequencer? ☻Page 12 of 210☻ Q 47 What are different activities in job sequencer? Q 48 What are triggers in data Stages? (conditional, unconditional, otherwise) Q 49 Are u generated job Reports? S Q 50 What is plug-in? Q 51 Have u created any custom transform? Explain? (Oconv) |