So this is going to be a simple Data analysis case study for a Fictitious company called Social Buzz as part of the Forage virtual internships
This was another interesting case study and I really enjoyed working with it, it's something I imaging a lot of companies had to deal with
Social Buzz was founded by two former engineers from a large social media conglomerate, one from London and the other from San Francisco. They left in 2008 and both met in San Francisco to start their business. They started Social Buzz because they saw an opportunity to build on the foundation that their previous company started by creating a new platform where
content took center stage. Social Buzz emphasizes content by keeping all users anonymous, only tracking user reactions on every piece of content. There are over 100 ways that users can react to content, spanning beyond the traditional reactions of likes, dislikes, and comments. This ensures that trending content, as opposed to individual users, is at the forefront of user feeds.
Over the past 5 years, Social Buzz has reached over 500 million active users each month. They have scaled quicker than anticipated and need the help of an advisory firm to oversee their scaling process effectively. Due to their rapid growth and digital nature of their core product, the amount of data that they create, collect and must analyze is huge. Every day over 100,000 pieces of content, ranging from text, images, videos and GIFs are posted. All of this data is highly unstructured and requires extremely sophisticated and expensive technology to manage and maintain. Out of the 250 people working at Social Buzz, 200 of them are technical staff working on maintaining this highly complex technology.
Up until this point, they have not relied on any third party firms to help them get to where they are. However there are 3 main reasons why they are now looking at bringing in external expertise
An audit of their big data practice
Recommendations for a successful IPO
An analysis of their content categories that highlights the top 5 categories with the largest aggregate popularity (which is our main focus today)
This is the organization map and each individual's role and responsibilities. This will give a holistic understanding of the Accenture team working on this project and each individuals’ role and responsibilities for this fictitious project in this internship.
how to capitalize on this much data?
Finding Social Buzz's top 5 most popular categories of content can shed lights on topics the marketing team can target
final data set containing all of the columns that you will need to complete the task. it is possible to use Excel to create the required data set. Based on which columns that will be most useful,so we are going to merge tables together by using the Unique Keys within tables.
I was given a set of data sets, all containing different columns and values, as well as a data model. A data model shows the relationships between all of the data sets, as well as any links that you can use to merge tables.
One of the first things to do is to a data model that fulfill the requirements of this task.
Data Modeling using Power BI
This data set combined three datasets:
Reaction
Content
Reaction Types
The client wanted to see “An analysis of their content categories showing the top 5 categories with the largest aggregate popularity”. This meant that the client wanted to know which categories of their content had yielded the greatest popularity out of all their content. But how do we quantify popularity? This was explained in the data model. Popularity is quantified by the “Score” given to each reaction type, as a numeric value. Therefore each reaction gives a weighting to how popular a piece of content may become. To find the categories with the greatest popularity, the data analyst must sum up which content categories have the largest aggregate Score.
Data cleaning is a common and very important task when working with data. This includes removing columns that have a high number of missing values, removing rows that have values which are erroneous, changing the data type of some values within a column and also removing columns which are not relevant to this task. Your end result should be a set of relevant data sets that are clean with each data set containing only the columns which are relevant to the completion of this task.
Upon doing a little inspection to our resulted data set columns, I found that the Category column has some unclean data, some words were wrapped in double quotes , and there was a mixed of words that contained Capital letters, the problem this is going to cause is these words are going to be considered as different categories
Once the data is cleaned, We can Export it from Power Bi and Power Query
Link to the Data set made after the merge : https://docs.google.com/spreadsheets/d/1kTPKD6YIab5ep8EsXyrnBrzekmVjZnFszCEuFO-uq0I/edit?usp=sharing
Data Pre-Processing done using R language in Kaggle
And this is a Simple Dashboard that shows more insights beyond what was required in the Task, for this , I made slight changes in the data model to be able to include other insights from the user's profile to the dashboard