SQL SERVER – SSIS

                       Explain architecture of SSIS?

SSIS architecture consists of four key parts:

a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.

b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications.

c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks.

d) Data flow engine: provides the in-memory buffers that move data from source to destination.

What are the tools associated with SSIS?

We use Business Intelligence Development Studio (BIDS) and SQL Server Management Studio (SSMS) to work with Development of SSIS Projects.

We use SSMS to manage the SSIS Packages and Projects.

What is the Control Flow?

When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components.  The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order.

 

What is a workflow in SSIS ?

Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages.

 

What is the Data Flow Engine?

The Data Flow Engine, also called the SSIS pipeline engine, is responsible for managing the flow of data from the source to the destination and performing transformations (lookups, data cleansing etc.).  Data flow uses memory oriented architecture, called buffers, during the data flow and transformations which allows it to execute extremely fast. This means the SSIS pipeline engine pulls data from the source, stores it in buffers (in-memory), does the requested transformations in the buffers and writes to the destination. The benefit is that it provides the fastest transformation as it happens in memory and we don't need to stage the data for transformations in most cases

 

What is a Transformation?

 A transformation simply means bringing in the data in a desired format. For example you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are  removed.  Anther example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.

 

What are the Transformations available in SSIS?

AGGEGATE  - It applies aggregate functions to Record Sets to produce new output records from aggregated values.

AUDIT  - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..

CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.

CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.

COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.

DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.

DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.

DERIVED COLUMN - Create a new (computed) column from given expressions.

EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.

FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.

FUZZY LOOKUP -  Used for Pattern Matching and Ranking based on fuzzy logic.

IMPORT COLUMN - Reads image specific column from database onto a flat file.

LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.

MERGE - Merges two sorted data sets into a single data set into a single data flow.

MERGE JOIN - Merges two data sets into a single dataset using a join junction.

MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.

ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.

ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.

UNION ALL - Merge multiple data sets into a single dataset.

PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns

UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.

 

What is a container and how many types of containers are there?

A container is a logical grouping of tasks which allows you to manage the scope of the tasks together. These are the types of containers in SSIS:

·     Sequence Container - Used for grouping logically related tasks together

·   For Loop Container - Used when you want to have repeating flow in package

·   For Each Loop Container - Used for enumerating each object in a collection; for example a record set or a list of files.

 

What are variables and what is variable scope ?

Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time. The scripts in the Script task and the Script component can also use variables. Two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services

 

Difference between Union all and Merge Join?

a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.

Difference between asynchronous and synchronous transformations?

Asynchronous transformation have different Input and Output buffers and it is up to the component designer in an Async component to provide a column structure to the output buffer and hook up the data from the input.

How to achieve parallelism in SSIS?

Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is -1 and is calculated as number of processors + 2.

How do you do incremental load?

Fastest way to do incremental load is by using Timestamp column in source table and then storing last ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored Timestamp so as to pick only new and updated records

What are the differences between DTS and SSIS

 

Can you name five of the Perfmon counters for SSIS and the value they provide?

§                       SQLServer:SSIS Service

§                       SSIS Package Instances

§                       SQLServer:SSIS Pipeline

§                       BLOB bytes read

§                       BLOB bytes written

§                       BLOB files in use

§                       Buffer memory

§                       Buffers in use

§                       Buffers spooled

§                       Flat buffer memory

§                       Flat buffers in use

§                       Private buffer memory

§                       Private buffers in use

§                       Rows read

§                       Rows written

 

How do you deploy SSIS packages.

BUILDing SSIS Projects provides a Deployment Manifest File. We need to run the manifest file and decide whether to deploy this onto File System or onto SQL Server [ msdb]. SQL Server Deployment is very faster and more secure then File System Deployment. Alternatively, we can also import the package from SSMS from File System or SQ Server.

How would you do Error Handling?

A SSIS package could mainly have two types of errors

a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.

b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component

Where are SSIS package stored in the SQL Server?

MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90, sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.

Business Intelligence Development Studio (BIDS) Example document attached.

Creating projects and solutions with SSIS  

Overview of projects and solutions in SSIS   

Creating a Simple SSIS Package    

SQL Server Integration Services (SSIS) Connection Managers   

SQL Server Integration Services (SSIS) Control Flow   

SQL Server Integration Services (SSIS) Data Flow   

Executing a Package in BIDS 

Deploying SSIS Packages  

SQL Server Integration Services (SSIS) Deployment Utility   

Command line deployment tool for SSIS packages   

Deploying SSIS packages with SQL Server Management Studio      

Scheduling SSIS packages with SQL Server Agent 

  Attached Related Documents  Below :