About the company/Case Study
Business Task
Prepare Data
Process Data
Visualization and Analysis
Summary And Recommendation
This case study is about a company called Bellabeat, founded by Urška Sršen and Sando Mur. Bellabeat is a high-tech company that manufactures health-focused smart products for women which collects data on their activity, sleep, stress, and reproductive health. This is aimed at empowering women with knowledge about their own health and habits.
Key Stake Holders:
Urška Sršen: Bellabeat’s co-founder and Chief Creative Officer
Sando Mur: Mathematician and Bellabeat’s co-founder; key member of the Bellabeat executive team
Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.
The company believes that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. Hence, has requested an analysis of the available smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, to make high-level recommendations for how these trends can inform Bellabeat's marketing strategy.
What are the trends Identified?
How would the Identified trends help Bellabeat?
The data was gotten from 30 users who volunteered to have their data taken for the purpose of this task. This Data is in the public domain and is made available through Mobius and can be viewed Here
For this project, we will be using BigQuery to store our data.
While uploading the data on BigQuery, I noticed an issue with the time format. Hence, I changed the time field on excel to the Datetime format (0001-01-01 00:00:00) before it was uploaded to BigQuery. Below is a sample screenshot of what we have after the upload.
The data source as stated earlier was from 30 users who volunteered to have their data taken this implies the data is Reliable, Original, Comprehensive, current, and cited hence. Hence, we can conclude that the data is unbiased.
I used various excel filters and select queries to check for irregularities in the data to ensure the data is clean and ready for use.
The data is made up of daily data, hourly and minute data, altogether making 18 different files in CSV. Recall that these have now been uploaded on Bigquery with each table name being the name of the CSV file. In this project, my focus will be on the daily data as I want to analyze the daily usage of smart devices. Hence, I will be working with daily activity, daily calories, daily intensities, daily steps, heart rate, and sleep day data. I will not be working with weight log data as it is insufficient with a total of 68 records from 30 users in 30 days.
The heart rate data was recorded per second, in other to make it usable for my daily analysis, I will try to get the heart rate in day format. This was achieved using the below SQL script.
select
id,
Sum(value) as TotalHeartReat,
CAST( time AS Date ) as Day
from `fitabase-376220.fitabase.heartrate_seconds_merged`
group by id, Day
order by id
I exported the result to a CSV file. I noticed we had only 132 outputs which could imply there was no record for some users on the data collected. In order to confirm this, I used Excel Pivot Table to sort the output by ID.
This implies that out of 30 users, only 7 users' data was captured for this category, hence we will drop the HeartRate data as it is insufficient to make a conclusion from it. This leaves us with 5 sets of data daily activity, daily calories, daily intensities, and daily steps.
Further review of my data shows that daily calories, daily intensities, and daily steps have already been merged into daily activity. Hence, I will go ahead and merge the activity data with the sleep data so that I can have all my data in a single place for analysis. This was done with the below script:
SELECT
a.Id,
a.ActivityDate,
a.TotalSteps,
a.TotalDistance,
a.TrackerDistance,
a.LoggedActivitiesDistance,
a.VeryActiveDistance,
a.ModeratelyActiveDistance,
a.LightActiveDistance,
a.SedentaryActiveDistance,
a.VeryActiveMinutes,
a.FairlyActiveMinutes,
a.LightlyActiveMinutes,
a.SedentaryMinutes,
a.Calories,
b.TotalSleepRecords,
b.TotalMinutesAsleep,
b.TotalTimeInBed
from
`fitabase-376220.fitabase.dailyActivity_merged` as a
left join `fitabase-376220.fitabase.sleepDay_merged` as b
on a.id = b.id and a.ActivityDate = b.SleepDay
The output of the script is stored as a CSV file as I will be doing the visualization and analysis on Tableau.
I decided to use Tableau for the visualization as it's a visualization and business intelligence tool, which allows the use of multiple data sources and helps anyone understand their data.
The data is imported into Tableau, I checked to ensure all fields are correctly assigned the right datatype and our data is ready for analysis. Let's have a look at what this looks like:
We are now set to try to identify trends that exist in the data.
I started by plotting Total Steps against Calories to see if there exists any relationship. We can see a strong correlation between the two entities. This implies that the more steps taken, the more likely it is to lose more calories. An almost identical relationship can be seen when we plot Total distance against Calories as well. See below for both charts.
Chart showing Calories plotted against Steps
Chart showing Calories plotted against Total Distance
The similarity seen above on both charts should not be a surprise as you would expect that the more the distance, the more the steps. See below, as you can see a strong correlation between steps and distance.
Chart showing a strong correlation between steps and Distance
Hence, it will be recommended that to lose more calories you should consider taking more steps or covering more distance. To back such a recommendation, I also tried to show that our data shows a clear indication that more steps can lead to more calories lost. Let's see the next two charts
Two chart rankng calories lost by users and steps covered by users
The first chart shows a ranking of users by calories lost.
From the chart (Calories by Users), notice the top two users based on calories lost. Also, notice the user with the lowest calorie lost.
The second chart shows a treemap of users by number of steps taken. Notice that the users with the highest and lowest steps are the same as the users with the second highest and lowest calories lost.
This doesn't imply that distant and steps are the only things required to lose calories, however, we an conclude they play a key role.
Could there be a relationship between users' activeness and calories lost?
Well, there is one way of finding out.
First, I tried to categorize users by activeness. From our data, we have various categories which are fairly active, lightly active, very active, and Sedentary. For the purpose of this work, I will be grouping fairly, lightly, and very active into a bucket, while treating Sedentary as a separate entity.
Analyzing the first bucket, I tried to understand how many of these users we have in the 3 categories are Fairly, Lightly, and Very active. See the below plot.
Chart showing users categorie
From the above chart, we have more users who are lightly active. I went further to determine if there is a relationship between these three categories of users with calories lost. Below is the output.
What we can see from the above is a positive correlation between active minutes and calories lost. And this is more evident for very active users. I believe this makes sense.
Well, that's not all. Let's how sedentary does against calories.
Chart showing Sedentary vs Calories
Judging by the trend line on the chart, you can see that there exists an inverse relationship slighly between both.
I'd recommend that to lose more calories, users should spend more active minutes than in a sedentary position.
Next, let's plots Time in bed against total sleep to see if any relationship exists. What we can see from the below chart is a very strong positive correlation between time in bed and sleep recorded. Therefore, we can say to record more sleep, users should speed more time in bed.
Chart Showing Time in Bed against Sleep Recorded
I also went further to see if there is a relationship between sleep and Active distance. I discovered something interesting which is an inverse relationship. You can see from the chart below that we have 3 categories of Active distance (Lightly Active, Moderately, and Very Active). What the chart shows is a negative correlation between sleep and active distance. This implies the more the active distance the lesser the sleep recorded. Personally, I would have expected the opposite in this case but our data says otherwise.
Chart sowing Sleep plotte against Active Distance
The below insight was gotten from the data and analysis done.
We found a strong correlation between Total Steps, Total Distance, and Calories lost. This totally makes a lot of sense as you would expect that those who tend to take more steps and cover more distance are expected to lose more calories.
The majority of the users are moderately active users. However, we saw that the very active users in minutes tend to lose more calories.
There is a slightly inverse relationship between sedentary and calories.
We also saw a strong correlation between time spent in bed and total minutes of sleep recorded for users.
There exists an inverse relationship between total distance and sleep. This is interesting because it will imply if you want to get more sleep consider reducing the distance covered or keeping it moderate.
Data regarding the user's other activities such as diets would also need to be evaluated just to see if other patterns exist.
Fewer users are keeping a record of their weight.
Based on the observed trends the following recommendation has be made.
Bella beats users should be encouraged to take more steps if they want to lose more calories. This can be done by sending notifications to users' devices. Some incentives may also be added to users who have the most steps, these can be a sort of competition amongst users. Such incentives should apply to those who are particular about losing calories.
Since we saw an inverse relationship between Sedentary and calories, users should be prompted to take some walks regularly to ensure the amount of time spent in a sedentary position is reduced. Also, this can be done by sending reminders as a pop-up on users' devices to leave their sedentary positions for little more active positions.
Considering that users who spend more time in bed sleep, users should be advised to have bedtime routines and timetables to improve their sleep. Bellabeat can provide them with a template for bedtime routines.
To achieve more sleep, users may consider spending less time being active, these will be more profound when in bed. It would require more data to back this up.
Users should be constantly reminded to keep a log of their weight journey as well.