Week 4

Our next step was to merge the first with the second database. To combine the first CVS file (flatfile), with the converted JSON file, was not as easy as we thougth. We had to start over and over again until we finally came to an end. As soon as we were ready to import the final dataset into the business analytics tool, the main computer on which everything was running on, started to struggle with bluescreens. Therefore we had to continue with the manual research until the computer was ready to continue. It took us almost 3 days to repair.

In the meantime we started to do manual one by one research for each company.

Some of the platforms we used to gather informations were the following:

Crunchbase.com

is the world’s most comprehensive dataset of startup activity and it’s accessible to everyone. You can find about 650k profiles of people and companies.

Crunchbase using the example of "Intel"

Datafox.co

DataFox is a platform for tracking the complex technology market. You get real-time data about private tech companies.

Datafox using the example of "Intel"

LinkedIn.com

Social business networking platform

LinkedIn using the example of "Intel"

One of the goal of the project is to get more information to find characteristics of the typical paying customer. Information like number of employees, annual revenue and industry and country of the headquarters were part of our researches. In this part of the project the tools and platforms which transifex provided us with, came in handy.

After we deleted all the redundancies, we tried to improve the data quality before gathering more informations about the paying customers. Provided this information, we could enrich our current Access database with attributes such as country of headquarter, industry, number of employees, revenue. Moreover we added some information about the plans such as costs. By enriching our database the ETL process is almost completed.

Access does not provide many features to make meaningful analysis about the data. Having that in mind, we imported the Access database into the SQL server on a virtual machine.

The SQL database provides us with a solid basis for analyzing with Microsoft Visual Studio Business Analytics. Therefore we created an analytics project in which we defined the data source, views on the data source as well as cubes and dimensions.

Next week we will start with the OLAP analysis with the provided data source.