For the internship experience we were to assume the role of a junior analyst as part of a Data Analytics Consulting team. The brief of the task was that a client, Sprocket Central Pty Ltd has approached the firm to help look at its data and provide strategic advice as to how to go about its new expansion plan. Sprocket Central Pty Ltd is a hypothetical company who sells bicycles and bicycle accessories. The task was broken down into three (3) modules;
1. Data Quality Assessment
2. Data Insights
3. Data Insights and Presentation
"Your task is to take a look at the following datasets provided by Sprocket Central Pty Ltd and identify all data quality issues. Once you've had a look at these datasets, draft an email to the client identifying all data quality issues."
Four datasets were provided in a CSV file. Data about Transactions, Customer Demographics, Customer Address and New List of Customers from which to target for the market expansion.
I examined the dataset following some data quality guidelines that was provided. Then I drafted a sample mail to my Team Lead highlighting the data quality issues I had observed and how I intend to approach the data cleaning.
Here is the email that I wrote as regards the first task. Listing out some specific issues that I found while trying to understand the contents of the tables in the data and what each field represented.
Next, I did some data cleaning using Excel. Removing outliers, blank records, invalid records, misspellings etc. This took the bulk of the time as the data cleaning ended up an iterative process. There were also some instances where it was quite difficult how to deal with data points with missing values. Removing those records may delete important chunks of the data. Such cases, however accounted for a little portion of the entire data set at around 4%. A couple of the cleaning steps is below.
Some of the data cleaning steps
Task: "Create a PowerPoint presentation which outlines the approach we will be taking to identify which of the 1000 customers Sprocket Central Pty Ltd should target, based on this dataset. Explain the three phases: Data Exploration; Model Development and Interpretation."
I did all of my data exploration in Microsoft Excel. Exploring Customer demographic trends, Spend patterns and Market share across customer location. The findings I derived from my exploration I put into a Power Point slide
Task: "Please develop a dashboard that we can present to the client at our next meeting. Display your data summary and results of the analysis in a dashboard (see tools/references for assistance). Specifically, your presentation should specify who Sprocket Central Pty Ltd' should be targeting out of the new 1000 customer list."
The dashboard was created in Power BI using all the findings that had been made from the data exploration