This analysis is Capstone project as a part of the Google Analytics course I completed.
Project Details: Bellabeat is a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have large potential to become large player in the global smart device market. The objective is to analyze users' device usage data and get insights to help develop future strategies.
Here we analyze Fitbit users' data about daily activity, heart rate, sleep, calories, etc. to apply the understanding to Bellabeat customer base.
Step 1 : Ask
Step 2 : Prepare
Step 3 : Process
Step 4 : Analyze
Step 5 : Share
Deliverable: A clear statement of the business task
The business task is to use Fitbit Fitness Tracker Data to analyze the usage of device users and gain insights to help the stakeholders make decisions about future strategies. Questions about the analysis are:
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
Key stakeholders: Urska Srsen (cofounder and Chief Creative Officer) and Sando Mur (cofounder and key member of the Bellabeat executive team)
Deliverable: A description of all data sources used
This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.
License - CC0: Public Domain
Acknowledgement - Furberg, Robert; Brinton, Julia; Keating, Michael ; Ortiz, Alexa (https://zenodo.org/record/53894#.YMoUpnVKiP9)
This dataset contains data in csv format. It has Activity, Calories, Intensity, METs, Steps, Heart rate, Sleep and Weight data. The data is structured in seconds, minutes, hourly and daily time period. For example, heart rate data is in seconds period. Steps, Intensity, and Calories are both available in narrow as well as wide formats while METs data is available in narrow format. I sorted the data for a better understanding and filtered it to find any outliers in an early stage.
A data dictionary is also present which explains the properties of various tables across dataset which I found on fitbase site - https://www.fitabase.com/media/1930/fitabasedatadictionary102320.pdf. The data is secondary and comes from a credible source
Limitations - The data is outdated and sample size is too small hence further investigation is needed as well as a greater sample in order to obtain accurate insights.
However, this can be a start for further analysis.
Deliverable: Documentation of any cleaning or manipulation of data
I used excel and BigQuery SQL to clean the data.
Cleaning:
Renamed the column names according to personal convenience but followed naming conventions.
Removed 4 rows with inconsistent value from the ‘dailyActivity’ data by using WHERE clause in SQL to filter the data and saved the result as a .csv file.
Removed outliers which were not consistent with rest of the data.
Corrected entries where values were illogical.
Formatted every column of every data according to a set standard.
Split Date and Time into separate columns.
Removed white spaces.
Divided METs by 10 to get accurate values.
Deliverable: A summary of your analysis
I used BigQuery SQL to perform analysis. I loaded processed data onto Bigquery Google Cloud.
The sample size is 33 users and the data time period is 31 days. Average number of usage days is 28 days with standard deviation of 6 days. I divided the users into categories - Everyday users, Usually Users, Frequent Users, Seldom Users and Rarely Users:
WITH
user_usage_days AS (
SELECT
Id,
COUNT(DISTINCT ActivityDate) AS number_usage_days
FROM
`bellbeat101.fitbit.dailyActivity`
GROUP BY
Id
)
SELECT
CASE
WHEN number_usage_days = 31 THEN "Everyday Users"
WHEN number_usage_days BETWEEN 28 AND 30 THEN "Usually Users"
WHEN number_usage_days BETWEEN 21 AND 27 THEN "Frequent Users"
WHEN number_usage_days BETWEEN 14 AND 20 THEN "Seldom Users"
ELSE "Rarely Users"
END AS user_group,
COUNT(Id)
FROM user_usage_days
WHERE
Id IN (
SELECT
Id
FROM
user_usage_days
)
GROUP BY
user_group
The sleep data we have has records for only 24 users. We divide sleep hour into three parts - less than 6 hrs of sleep, between 6-9 hrs of sleep and more than 9 hrs of sleep. We group these 25 users according their respective user group and daily average sleep:
WITH
sleep AS(
SELECT
Id,
AVG(TotalMinutesAsleep)/60 AS average_sleep_hours
FROM
`bellbeat101.fitbit.sleepDay`
GROUP BY
Id
)
SELECT
CASE
WHEN average_sleep_hours < 6 THEN "Less than 6 hrs of sleep"
WHEN average_sleep_hours BETWEEN 6 AND 9 THEN "6-9 hrs of sleep"
WHEN average_sleep_hours > 9 THEN "More than 9 hrs of sleep"
ELSE "Error"
END AS sleep_hours,
COUNT(Id) AS number_of_users
FROM
sleep
WHERE
Id IN (
SELECT
Id
FROM
user_usage_days
WHERE
number_usage_days = 31
)
GROUP BY
sleep_hours
Heart Rate data contained records of only 14 users out of 33. All the users had different hours of daily records. I divided the data into users with 'Less than 12 hours of heart rate data', 'Between 12 to 18 hours of heart rate data' and 'More than 18 hours of heart rate data':
SELECT
CASE
WHEN average_no_of_hours < 12 THEN "Less than 12 hours of data record"
WHEN average_no_of_hours BETWEEN 12 AND 18 THEN "Between 12 to 18 hours of heart rate data"
WHEN average_no_of_hours BETWEEN 18 AND 24 THEN "Between 18 to 24 hours of heart rate data"
ELSE "Error"
END AS heart_rate_hours_records,
COUNT(Id) AS number_of_users
FROM
`bellbeat101.fitbit.heart_summary`
GROUP BY
heart_rate_hours_records
Deliverable: Supporting Visualizations and key findings
On an average a fitbit user spends 81% of the day being sedentary and 19% of the day being active. It shows that fitbit device is not just used for tracking fitness metrics but also worn casually.
Out of 33 users only 24 users have sleep records. This can be attributed to the fact that during sleep user may feel uncomfortable with the device worn. Average number of sleep days recorded among these users is 17 days.
As we have divided the total number of users into groups based on their usage, the following visual shows the representation:
Almost 64% of all users wears the device daily. Daily steps, distance and amount of sleep is represented by:
From above visuals, it can be seen that most active part of the day for this group is between 6 am-7 pm.
Sunday marks the lowest activity maybe because of weekend. Mostly active during weekdays.
Almost 69% of the group gets the recommended sleep of 6-9 hours daily and 25% of them gets less than 6 hours of sleep.
This group is the most active group of all with nearly 8000 average daily steps and 5.7 kms of average daily distance.
Average active minutes is second highest among all the groups.
Calories visuals shows that most calories are burnt during Morning and Afternoon time.
The peak is found to be during late afternoon to early evening.
This group comprises of 18% (6 Users) of the total number of users. Daily steps, distance and amount of sleep is represented by:
Users in this group has used the device for 28-30 days in total.
The most active part of the day is between 6 am to 10 pm. The late night activity maybe because the users maybe awake late night.
Most active days are weekend days and least active days are mostly weekdays. The most active hours on Saturday and Sunday are in the morning time to afternoon time. This maybe because of some high intensity activities like workouts, hiking, cycling, etc.
60% of the users gets the recommended amount of sleep per day on an average.
As this group is mostly active only on Saturdays and Sundays, the average daily steps and distance are nearly 7000 and 5 kms respectively.
This group has the lowest sedentary time. This is because significant number of users are not having enough sleep.
Peak calories burning occurs during early evening and most calories are burnt during morning time.
6% (2 Users) of the total number of users are frequent users. Daily steps, distance and amount of sleep is represented by:
Frequent users have used the device for 21-27 days in total.
It can been seen from above visuals that Monday is the most active day while weekend days are least active. This maybe because weekend days are rest days.
One another thing I noticed was that all other weekdays had almost uniform activity daily at exactly between 6 am and 8 am in the morning which suggest that these users may workout in the morning regularly.
All the users of this group are not getting good sleep of 6-9 hrs.
This user has low active minutes but they are generally intense and the activities are consistent.
Although the sleep time is low for these users, sedentary time is high. Consistency in the steps during late morning to evening time period suggest that sedentary time maybe high because of the nature of work these users does on routine basis which maybe a desk job or some indoor job.
As it is known from above that this group marks morning workout routine on weekdays, the calories burnt are maximum in the evening and peak is found at 8 am in the morning.
This user group contains 9 percent of the sample (3 Users). Daily steps, distance and amount of sleep is represented by:
Users have worn the device for 14-20 days in total.
Most active day is Saturday from 8 am in the morning to 1 pm in the afternoon.
Only 1 user in this group has sleep record and it can be seen that the user gets the recommended sleep of 6-9 hrs.
The data shows that the user is excellently regular in his/her sleep. Everyday the wake up time is between 6-7 am.
These users mostly wears the device casually and on some days to track fitness metrics.
This group has the highest average active minutes per day with 5.3 Kms of average total distance and just above 7700 steps.
For this group morning is the most active period, hence calories burning is high in the morning.
Rarely users has just 1 user. Daily steps, distance and amount of sleep is represented by:
This user has used the device for less than 14 days.
Out of 7 days of week, user has only used the device for 3 days.
From the visuals, it can be seen that on Friday, the user has taken off the device from his/her wrist at 4 pm in the afternoon.
This suggests that device usage throughout the day is also low.
Usage data suggest that user has worn the device maybe just for casual purposes.
This user group has the highest average sedentary time and is least active with only around 3800 steps on an average and 2.9 Kms average total distance.
Heart Rate records are present for only 14 users out of 33 users.
Out of those 14 users 57% has heart rate records of 18-24 hours per day on an average.
I isolated the users with 18-24 hours of records and analyzed it.
In the below graph, heart rate fluctuations can be seen throughout the day.
The fluctuations are highest during most active parts of the day and they are least while users are asleep.
The data shows that the rise in heart rate starts at 6 am in the morning which is because users wakes up and activities begin.
Highest fluctuation is during early evening time maybe because of the fitness activities during that time period.
Deliverable: Your top high-level insights based on your analysis
Based on the analysis it is found that most users wear smart watch devices not only for fitness purposes but also for casual use.
It is found that less users wears device during sleep i.e., 73%.
Lack of heart rate records (only 14 users has heart rate records).
Manually filled data like weight log is rarely entered by users.
Found 3 types of users - Users who wore the device 100% of the time and used it for fitness goals and also wore it casually, users who wore the device just to track physical fitness goals and users who wore the device casually
Communication about the Bellabeat company, its products and also how to use them are very important. Stakeholders needs to advertise products according to different types of users.
Work on finding out the reason why the device is less preferred while sleeping through surveys and feedback forms.
Build a more accurate and consistent heart rate recording system which can be useful in building a alert mechanism in case of any high heart rate fluctuations.
Develop mobile application which can be use to target different types of audiences based on their requirements. Analysis has shown that putting reminders of sleep, exercise, sedentary and manual weight log on regular intervals can help users improve their fitness based on their daily usage.
As Bellabeat's target audience is women, the feature of tracking menstrual health through manual input can help users track their cycles.