INFORMATICA vs DATASTAGE

Features Informatica DataStage

System Requirement

- Platform Support Win NT/UNIX Win NT/UNIX/More Platforms

Deployment Facility

- Ability to handle initial deployment, major releases, minor releases and patches with equal ease

Yes,  My experience has been that INFA is definitely easier to implement initially and upgrade.

No, Ascential has done a good job in recent releases.

Transformations

- No of available

transformation functions

58 28,

DS has many more canned

transformation functions than

28.

- Support for looping the

source row (For While Loop)

Supports for comparing

immediate previous

record

Does not support

- Slowly Changing Dimension Full history, recent

values, Current & Prev

values

Supports only through Custom

scripts. Does not have a

wizard to do this.

DS has a component called

ProfileStage that handles this

type of comparison. You'll want

to use it judiciously in your

production processing because

it does take extra resources to

use it but I have

found it to be very useful.

- Time Dimension generation Does not support. Does not support.

- Rejected Records Can be captured Cannot be captured in

separate file.

DS absolutely has the ability to

capture rejected records in a

separate file. That's a pretty

basic capability and I don't

know of any ETL tool

that can't do it...

- Debugging Facility Not Supported. Supports basic debugging

facilities for testing.

Application Integration

Functionality

- Support for real Time

Data Exchange

Not Available Not Available,

The 7.x version of DS has a

component to handle real-time

data exchange. I think it is

called RTE.

- Support for CORBA/XML Does not support Does not support

?Page 84 of 210?

Metadata

- Ability to view & navigate

metadata on the web

Does Not Support Job sessions can be monitored

using Informatica

Classes.

This is completely not true.
DS

has a very strong metadata

component (MetaStage) that

works not only with DS, but

also has plug-ins to work

with modeling tools (like

ERWin) and BI tools (like

Cognos). This is one

of their strong suits (again,

IMHO).

- Ability to Customize views of

metadata for different users

(DBA Vs

Business user).

Supports Not Available,

Also not true - MetaStage

allows publishing of metadata

in HTML format for different

types of users. It is completely

customizable.

- Metadata repository can be

stored in RDBMS

Yes No. But the proprietary meta

data can be moved to a

RDBMS using the DOC Tool

Support And Maintenance

- Command line operation Pmcmd -server interface

for command line

Not Available

- Ability to maintain versions

of mappings

Yes No,

Not true - this has been a weak

spot for DS in past releases,

but the7.x version of DS has a

good versioning tool.

Job Controlling & Scheduling

- Alerts like sending mails Supported Does not support directly (no

option). But possible

to call custom programs after

the job get executed)

?Page 85 of 210?

Comparison Between DataStage (Server Edition) and Informatica

1. Datastage is closely integrated with its repository (universe database). Informatica is not.

With the introduction of repository server, they have isolated the server calls with

repository calls to reduce the load. This has both advantages and disadvantages.

2. Datastage is more powerful transformation engine by using functions (Oconv and IConv)

and routines. We can do almost any transformation. Informatica is more visual,

programmer friendly.

3. Lookups are much faster in Datastage, because the way the hash files are built. You can

tune the hash files to get an optimal performance.

4. Datastage best practices calls for landing the data in between transformations and have

smaller simpler job. The job when compiled generates a Basic routine and bigger the

routine is, slower the job performs. For a simple project, you might end up having 3-4

times more jobs in Datastage than in Informatica.

5. Datastage does not perform very well with heterogeneous sources. You might end up

extracting data from all the sources and putting them into a hash and start your

transformation. This may not be the case with Informatica.

6. DataStage and Informatica support XML. DataStage comes with XML input,

transformation and output stages.

7. Both products have an unlimited number of transformation functions since you can easily

write your own using the command interface.

8. Both products have options for integrating with ERP systems such as SAP, PeopleSoft

and Seibel but these come at a significant extra cost. You may need to evaluate these.

SAP is a reseller of DataStage for SAP BW, PeopleSoft bundles DataStage in its EPM

products.

9. DataStage has some very good debugging facilities including the ability to step through a

job link by link or row by row and watch data values as a job executes. Also server side

tracing.

10. DataStage 7.x releases have intelligent assistants (wizards) for creating the template jobs

for each type of slowly changing dimension table loads. The DataStage Best Practices

course also provides training in DW loading with SCD and surrogate key techniques.

11. Ascential and Informatica both have robust metadata management products. Ascential

MetaStage comes bundled free with DataStage Enterprise and manages metadata via a

hub and spoke architecture. It can import metadata from a wide range of databases and

modeling tools and has a high degree of interaction with DataStage for operational

metadata. Informatica SuperGlue was released last year and is rated more highly by

Gartner in the metadata field. It integrates closely with PowerCenter products. They

both support multiple views (business and technical) of metadata plus the functions you

would expect such as impact analysis, semantics and data lineage.

12. DataStage can send emails. The sequence job has an email stage that is easy to

configure. DataStage 7.5 also has new mobile device support that can administer the

DataStage jobs via a palm pilot. There are also 3rd party web based tools that let you run

and review jobs over a browser. We can send sms admin messages from a DataStage

UNIX server.

13. DataStage has a command line interface. The dsjob command can be used by any

scheduling tool or from the command line to run jobs and check the results and logs of

jobs.

?Page 86 of 210?

14. Both products integrate well with Trillium for data quality, DataStage also integrate with

QualityStage for data quality. This is the preferred method of address cleansing and

fuzzy matching.

15. Deployment facility: Ability to handle initial deployment, major & minor releases and

patches with ease.

? Informatica: Yes.

? DataStage: No

16. Support for looping the source row (For While Loop).

? Informatica: Supports for comparing immediate previous record

? DataStage; Does not support.

17. Slowly Changing Dimension.

? Informatica: Supports Full History, Recent Values, Current & Previous Values.

? DataStage: Supports only through Custom scripts. Does not have a wizard to do

this.

18. Time Dimension generation.

? Informatica: Does not support.

? DataStage: Does not support

19. Rejected records.

? Informatica: Cab be captured.

? DataStage: Cannot be captured (Cab be captured in a separate file).

20. Debugging Facility.

? Informatica: Does not Support.

? DataStage: Supports basic debugging facilities for testing.

21. Ability to Customize views of metadata for different users (DBA Vs Business user).

? Informatica: Supports.

? DataStage: Supports.

22. Metadata repository can be stored in RDBMS

? Informatica: Yes.

? DataStage: No.

23. Support And Maintenance: Command line operation.

? Informatica: Yes (pmcmd).

? DataStage: Yes (dsjob).

24. Ability to maintain versions of mappings/jobs.

? Informatica: Yes.

? DataStage: Yes.

25. Job Controlling & Scheduling.

? Informatica: Yes.

? DataStage: Yes.

?Page 87 of 210?

1) System Requirement

1.1 Platform Support

1.1.1 Informatica: Win NT/ Unix

1.1.2 DataStage: Win NT/ Unix/More platforms.

2) Deployment facility

2.1. Ability to handle initial deployment, major releases, minor releases and patches with equal

ease

2.1.1.Informatica:. Yes

2.1.2.DataStage: No

My experience has been that INFA is definitiely easier to implementinitially and upgrade.

Ascential has done a good job in recent releases

to improve, but IMHO INFA still does this better.

3) Transformations

3.1. No of available transformation functions

3.1.1.Informatica:. 58

3.1.2.DataStage: 28

DS has many more canned transformation functions than 28. I'm not surewhat leads you to this

number, but I'd recheck it if I were you.

3.2. Support for looping the source row (For While Loop)

3.2.1.Informatica:. Supports for comparing immediate previous record

3.2.2.DataStage: Does not support

3.3. Slowly Changing Dimension

3.3.1.Informatica:. Supports Full history, recent values, Current & Prev values.

3.3.2.DataStage: Supports only through Custom scripts. Does not have a

wizard to do this

DS has a component called ProfileStage that handles this type ofcomparison. You'll want to use it

judisciously in your production

processing because it does take extra resources to use it but I have

found it to be very useful.

3.4. Time Dimension generation

3.4.1.Informatica:. Does not support.

3.4.2.DataStage: Does not support

3.5. Rejected Records

3.5.1.Informatica:. Can be captured

3.5.2.DataStage: Cannot be captured in separate file

DS absolutely has the ability to capture rejected records in a separatefile. That's a pretty basic

capability and I don't know of any ETL tool

that can't do it...

3.5. Debugging Facility

3.5.1.Informatica:. Not Supported

3.5.2.DataStage: Supports basic debugging facilities for testing.

4) Application Integration Functionality

4.1. Support for real Time

Data Exchange

4.1.1..Informatica:. Not Available

4.1.2.DataStage: Not Available.

The 7.x version of DS has a component to handle real-time dataexchange. I have not personnaly

used it yet, but you should look into

?Page 88 of 210?

it. I think it is called RTE.

4.2. Support for CORBA/XML

4.1.1..Informatica:. Does not support

4.1.2.DataStage: Does not support

5) Metadata

5.1. Ability to view & navigate metadata on the web

5.1.1..Informatica:. Does not support

5.1.2.DataStage: Job sessions can be monitored using Informatica

Classes

This is completely not true. DS has a very strong metadata component(MetaStage) that works not

only with DS, but also has plug-ins to work

with modeling tools (like ERWin) and BI tools (like Cognos). This is one

of their strong suits (again, IMHO).

5.1. Ability to Customize views of metadata for different users (DBA Vs

Business user)

5.1.1..Informatica:. Supports.

5.1.2.DataStage: Not Available

Also not true - MetaStage allows publishing of metadata in HTML format for different types of

users. It is completely customizable.

5.1. Metadata repository can be stored in RDBMS

5.1.1..Informatica:. Yes

5.1.2.DataStage: No. But the proprietary meta data can be moved to a

RDBMS using the DOC Tool

6) Support And Maintenance

6.1. Command line operation

6.1.1..Informatica:. Pmcmd -server interface for command line

6.1.2.DataStage: Not Available

6.2. Ability to maintain versions of mappings

6.1.1..Informatica:. Yes

6.1.2.DataStage: No

Not true - this has been a weak spot for DS in past releases, but the7.x version of DS has a good

versioning tool.

7) Job Controlling & Scheduling

7.1. Alerts like sending mails

7.1.1..Informatica:. Supported.

7.1.2.DataStage: Does not support directly ( no option). But possible

to call custom programs after the job get executed)

Further mistakes in your comparison, mainly from a DataStage based angle as my experience is

with that product:

_ Both DataStage and Informatica support XML. DataStage comes with XML input,

transformation and output stages.

_ Both products have an unlimited number of transformation functions since you can easily

write your own using the command interface.

?Page 89 of 210?

_ Both products have options for integrating with ERP systems such as SAP, PeopleSoft

and Seibel but these come at a significant extra cost. You may need to evaluate these.

SAP is a reseller of DataStage for SAP BW, PeopleSoft bundles DataStage in its EPM

products.

_ DataStage has some very good debugging facilities including the ability to step through a

job link by link or row by row and watch data values as a job executes. Also server side

tracing.

_ DataStage 7.x releases have intelligent assistants (wizards) for creating the template jobs

for each type of slowly changing dimension table loads. The DataStage Best Practices

course also provides training in DW loading with SCD and surrogate key techniques.

_ Ascential and Informatica both have robust metadata management products. Ascential

MetaStage comes bundled free with DataStage Enterprise and manages metadata via a

hub and spoke architecture. It can import metadata from a wide range of databases and

modelling tools and has a high degree of interaction with DataStage for operational

metadata. Informatica SuperGlue was released last year and is rated more highly by

Gartner in the metadata field. It integrates closely with PowerCenter products. They

both support multiple views (business and technical) of metadata plus the functions you

would expect such as impact analysis, semantics and data lineage.

_ DataStage can send emails. The sequence job has an email stage that is easy to

configure. DataStage 7.5 also has new mobile device support so you can administer

your DataStage jobs via a palm pilot. There are also 3rd party web based tools that let

you run and review jobs over a browser. I found it easy to send sms admin messages

from a DataStage Unix server.

_ DataStage has a command line interface. The dsjob command can be used by any

scheduling tool or from the command line to run jobs and check the results and logs of

jobs.

_ Both products integrate well with Trillium for data quality, DataStage also integrate with

QualityStage for data quality. This is the preferred method of address cleansing and

fuzzy matching.

Milind - I've got to ask - where are you getting your information from??? I have done ETL tool

comparisons for several clients over the past 7 or so years. They are both good tools with

different strengths so it really depends on what your organizations needs / priorities are as to

which one is "better". I have spent much more time in the past couple of years on DS than INFA

so I don't feel I can speak to the changes INFA has made lately, but I know you have incorrect

info about DS.

I am currently working with a client on DS v7.1. I've made a few comments below for the more

glaring inaccuracies or topics where I have up-to-date experience. I suggest you re-research and

perhaps do a proof-of-concept with each vendor.

FYI - I don't know if you have looked at the Parallel Extender component of DS 7.x, but it is a

terrific capability if you have challenges with meeting availability requirements. It is one of the

most impressive changes Ascential has made lately (IMHO).

Gartner has vendor reports on Ascential and Informatica. They also have a magic

quadrant that lists both DataStage and Informatica as the clear market leaders. I don't

think you can go wrong with either product, it comes down to whether you can access

experts in these products for your project and what options you have for training. I think

if you go into a major project with either product and you don't have an expert on your

team it can go badly wrong.

?Page 90 of 210?

Further mistakes in your comparison, mainly from a DataStage based angle as my

experience is with that product:

- Both DataStage and Informatica support XML. DataStage comes with XML

input, transformation and output stages.

- Both products have an unlimited number of transformation functions since you

can easily write your own using the command interface.

- Both products have options for integrating with ERP systems such as SAP,

PeopleSoft and Seibel but these come at a significant extra cost. You may need to

evaluate these. SAP is a reseller of DataStage for SAP BW, PeopleSoft bundles

DataStage in its EPM products.

- DataStage has some very good debugging facilities including the ability to step

through a job link by link or row by row and watch data values as a job executes.

Also server side tracing.

- DataStage 7.x releases have intelligent assistants (wizards) for creating the

template jobs for each type of slowly changing dimension table loads. The

DataStage Best Practices course also provides training in DW loading with SCD

and surrogate key techniques.

- Ascential and Informatica both have robust metadata management products.

Ascential MetaStage comes bundled free with DataStage Enterprise and manages

metadata via a hub and spoke architecture. It can import metadata from a wide

range of databases and modelling tools and has a high degree of interaction with

DataStage for operational metadata. Informatica SuperGlue was released last

year and is rated more highly by Gartner in the metadata field. It integrates

closely with PowerCenter products. They both support multiple views (business

and technical) of metadata plus the functions you would expect such as impact

analysis, semantics and data lineage.

- DataStage can send emails. The sequence job has an email stage that is easy to

configure. DataStage 7.5 also has new mobile device support so you can

administer your DataStage jobs via a palm pilot. There are also 3rd party web

based tools that let you run and review jobs over a browser. I found it easy to

send sms admin messages from a DataStage Unix server.

- DataStage has a command line interface. The dsjob command can be used by any

scheduling tool or from the command line to run jobs and check the results and

logs of jobs.

- Both products integrate well with Trillium for data quality, DataStage also

integrate with QualityStage for data quality. This is the preferred method of

address cleansing and fuzzy matching.

?Page 91 of 210?

How Should We Implement A Slowly Changing Dimension?

Currently, our data warehouse has only Type 1 Slowly Changing Dimensions

(SCD). That is to say we overwrite the dimension record with every update. The

problem with that is when data changes, it changes for all history while this is valid for

data entry corrections, it may not be valid for all data. An acceptable example could be

Customer Date of Birth. If the date of birth was changed, chances are the reason was that

their data was incorrect.

However, if the Customer address were changed, this may and probably does mean the

customer moved. If we simply overwrite the address then all sales for that customer will

belong to the new address. Suppose the customer moved from Florida to Ohio. If we

were trying to track sales patterns by region, all of the customer’s purchase that were

made in Florida would now appear to have been made in Ohio.

Type 1 Slowly Changing Dimension

Customer Dimension

CODE

ID CustKey Name DOB City State

1001 BS001 Bob Smith 6/8/1961 Tampa FL

1002 LJ004 Lisa Jones 10/15/1954 Miami FL

Customer Dimension After Edits

CODE

ID CustKey Name DOB City State

1001 BS001 Bob Smith 6/8/1961 Dayton OH

1002 LJ004 Lisa Jones 10/15/1954 Miami FL

In the example above, the DOB change doesn’t affect any dimensional reporting

facts. However, the City, State change would have an affect. Now all sales for Bob

Smith would appear to come from Dayton, Ohio rather than from Tampa, Florida.

The solution we have chosen for solving this problem is to implement a Type 2 slowly

changing dimension. A Type 2 SCD records a separate row each time a value is changed

in the dimension. In our case, we are declaring that we will only create a new dimension

record when certain columns are changed. In the example above, we would not record a

new record for the DOB change but we would for the address change.

Type 2 Slowly Changing Dimension

Customer Dimension

CODE

ID CustKey Name DOB City St Curr Effective Date

1001 BS001 Bob Smith 6/8/1961 Tampa FL Y 5/1/2004

1002 LJ004 Lisa Jones 10/15/1954 Miami FL Y 5/2/2004

?Page 92 of 210?

Customer Dimension After Edits

CODE

ID CustKey Name DOB City St Curr Effective Date

1001 BS001 Bob Smith 6/8/1961 Tampa FL N 5/1/2004

1002 LJ004 Lisa Jones 10/15/1954 Miami FL Y 5/2/2004

1003 BS001 Bob Smith 6/8/1961 Dayton OH Y 5/27/2004

As you can see, there are two dimension records for Bob Smith now. They both have the

same CustKey values, but the have different ID values. All future fact table rows will use

the new ID to link to the Customer dimension. This is accomplished by the use of the

Current Flag. The ETL process looks only at the current flag when recording new orders.

However, in the case of an update to an order the Effective Date must be used to

determine which customer the update applies to.

The primary issue with Type 2 SCD is the volume of data grows exponentially as more

changes are tracked. This can impact performance in a star schema. The principle

behind the star schema design is that while facts are few columns, they have many rows

but they only have to perform single level joins to resolve their dimensions. The

assumption is that the dimensions have lots of columns but relatively few rows. This

allows for very fast joining of data.

Conforming Dimensions

For the purposes of this discussion conforming dimensions only need a brief

definition. Conforming dimensions are a feature of star schemas that allow facts to share

dimensional data. A conforming dimension occurs when two dimensions share the same

keys. Often they have different attributes. The goal is to ensure that any fact table can

link to the conforming dimension and consume its data so long as the dimension is

relevant.

Conforming Dimension

Customer Dimension

CODE

ID CustKey Name DOB City State

1001 BS001 Bob Smith 6/8/1961 Tampa FL

1002 LJ004 Lisa Jones 10/15/1954 Miami FL

Billing Dimension

CODE

ID Bill2Ky Name Account Type Credit Limit CustKey

1001 9211 Bob Smith Credit $10,000 BS001

1002 23421 Lisa Jones Cash $100 LJ004

In the example above, we could use the ID from the Customer dimension in a fact and in

?Page 93 of 210?

the future a link to the Billing dimension could be established without having to reload

the data.

We are considering a slight modification to the standard Type 2 SCD. The idea is to

maintain two dimensions one as a Type 1 and one as a Type 2. The problem with this is

we lose the ability to use conforming dimensions.

Type 2 and Type 1 Slowly Changing Dimension

Customer Dimension Type 1

CODE

ID CustKey Name DOB City St Curr Effective Date

1001 BS001 Bob Smith 6/8/1961 Dayton OH Y 5/1/2004

1002 LJ004 Lisa Jones 10/15/1957 Miami FL Y 5/2/2004

Customer Dimension Type 2

CODE

ID CustKey Name DOB City St Curr Effective Date

1001 BS001 Bob Smith 6/8/1961 Tampa FL N 5/1/2004

1002 LJ004 Lisa Jones 10/15/1957 Miami FL Y 5/2/2004

1003 BS001 Bob Smith 6/8/1961 Dayton OH Y 5/27/2004

As you can see, the current ID for Bob Smith in the Type 1 SCD is 1001, while it is 1003

in the Type 2 SCD. This is not conforming.

Our solution is to create a composite key for the Type 2 SCD.

Type 2 and Type 1 Slowly Changing Dimension

Customer Dimension Type 1

CODE

ID CustKey Name DOB City St

1001 BS001 Bob Smith 6/8/1961 Dayton OH

1002 LJ004 Lisa Jones 10/15/1957 Miami FL

Customer Dimension Type 2

CODE

ID SubKey CustKey Name DOB City St Curr Eff Date

1001 001 BS001 Bob Smith 6/8/1961 Tampa FL N 5/1/2004

1002 001 LJ004 Lisa Jones 10/15/1957 Miami FL Y 5/2/2004

1001 002 BS001 Bob Smith 6/8/1961 Dayton OH Y 5/27/2004

In the example above, the Type 1 and the Type 2 dimensions conform on the ID level. If

a fact needs the historical data it will consume both the ID and the SubKey.