Bellabeat, a high-tech company that manufactures health-focused smart products has positioned itself as a tech-driven wellness company for women in recent years. Bellabeat develops wearables and accompanying products that monitor biometric and lifestyle data to help women better understand how their bodies work and make healthier choices.
Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. An in-depth analysis of Bellabeat’s available consumer data would reveal more opportunities for growth thus the marketing analytics team was tasked to analyze the data in order to gain insights into how people are using their smart devices.
Using this information, the Chief Creative Officer would like high-level recommendations for how these trends can help guide Bellabeat's marketing strategies
As a junior data analyst working in the marketing analyst team at Bellabeats, what are the key questions that I want to answer.
Are our product users active enough?
Are there any noticeable trends derivable from our product users' exercise and sleep data?
Are our product users getting enough shut eye?
Do more active users sleep faster?
How much difference in calories burnt for active users comparing with the less active ones?
Are there statistical results and relationships derivable from our users' data?
How could these trends help influence Bellabeat's marketing strategy?
This data set contains personal fitness tracker info from thirty three fitbit users. Thirty three eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activities, steps, and heart rate that can be used to explore the users’ habits .
The data files can be downloaded from https://www.kaggle.com/arashnic/fitbit in csv format.
Upon taking a look at the CSV files, I recommend that certain data sets be selected while others will not be used for my analysis based on the following reasons :
dailyActivity_merged.csv is a merged file of several disparate data files
Some fields do not have units of measurement thus I cannot make sense of the data
e.g value col in heartrate_seconds ,TotalIntensity or AverageIntensity in hourlyIntensities_merged.csv
Intensity col in minuteIntensitiesNArrow_merged, MET Col in MinuteMETs_merged
Fields that are too granular.
e.g calories per hour where the total calories per day in dailyActivity_merged.csv will be easier to understand.
Steps per hour in hourlySteps_merged.csv where the total steps per day in dailyActivity_merged.csv will be easier to understand.
Calories per minute in minuteCaloriesNArrow_merged
Incomplete data like weighLogInfo_merged that only has 8 distinct users
dailyActivity_merged.csv and sleepDay_merged.csv seems to be the only data sets that will be used for my analysis as these data centers around activities and sleep patterns.
However we have to take a few liberties on interpreting some units of measurements
Distance recorded will be in kilometers.
Calories recorded is referring to calories burnt and not calories consumed.
Creating the necessary tables for daily_activity, daily_calories(as a test), daily _steps (as a test), sleep_day with column names and data types in PostGreSQL.
Formatting the date columns of the csv files to YYY-MM-DD format before importing them into the created tables
Let's import the csv data into the respective tables via the PgAdmin Interface
Altering the 2 tables' id column data types to make it discrete rather than continuous
Creating the dailyCalories and dailySteps tables as comparison test
As dailyActivity_merged is a combination of several disparate files, let's do some tests to see if the values of the separate files are indeed inside dailyAcitivy_merged.csv.
Using the EXCEPT function, our aim is to find results that are unique in 1 table and do not exist in the other table. Results have shown that all data in the disparate steps and calories csv files are already in the combined daily_activity
Creating a VIEW via JOINING of both activity and sleep tables for later analysis
33 Distinct IDs activity recorded
24 Distinct IDs sleep patterns recorded
9 IDs out of the 33 did not record sleep details
All 24 IDs in sleep records have their daily activities recorded as part of the 33.
The CDC's recommendation for minimum number of steps per day is at 10,000. Based on the average number of steps per ID, you can see that only 7 achieved the milestone while the rest falling short of the recommendation, with the Median and Average hovering around 7500 number of steps.
https://www.medicalnewstoday.com/articles/how-many-steps-should-you-take-a-day#for-weight-loss
Sleep and Active minutes durations seem to increase in tandem over Fridays and Saturdays. Both fields then diverge on Sundays, possibly due to the following workday as users catch more sleep and being the most inactive. A day of relaxing to compensate for the coming Monday Blues?
The CDC's recommendation for minimum hours per sleep per day to be at least 7hrs (420 mins), if the data is accurate, only half of the 24 users get enough sleep per day
Based on the CDC's recommended minutes of physical activities for adults, I shall categorize the 33 users based on their respective very active, fairly active and lightly active amount of minutes per week.
The 33 users are classified into 3 categories with those having more than 70mins of intense activities (per week) under 'Very Active', while those with more than 140mins of fairly intense activities (per week) under 'Moderately Active', and finally those with more than 225mins of light intensity mins (per week) as 'Lightly Active'.
The reason why I lower the minutes criteria per category instead of strictly following the CDC guidelines is because it is just a recommended guide to follow. Some of the product users are almost reaching the recommended number of minutes per category as well as recorded engaging in other categories of activity durations per week therefore it is the overall time spend on being active that matters rather than focusing on being labelled under any categories.
https://www.cdc.gov/physicalactivity/basics/adults/index.htm
From the graph, we see the difference in average minutes it takes to fall asleep between the 3 groups of product. This graph is an indication of one of the many benefits of exercising, helping us sleep faster thus maximizing our time spend in bed.
Based on the data collated and analysis per se without taking into consideration factors like weight, metabolism rate, body mass index etc., we can see the distinct distribution of calories burnt per user activity group.
Analysis of the data supports the widely acknowledge medical fact that the more a user travels (distance and steps) the more calories they burnt.
Based on the slope equations above
To find out the amount of calories burnt, based on distance travelled : 118.701 times (distance travelled) + 1654.31
To find out the amount of calories burnt, based on steps taken : 0.0840272 times (steps taken) + 1664.52
The correlation coefficient between distance and calories is at 0.6466, which indicates a strong correlation between distance travelled and calories burnt.
The correlation coefficient between steps and calories is at 0.5929, which also indicates a strong correlation between steps taken and calories burnt.
Both p-values are less than 0.05 which means the results are statistically significant
The R-squared values in the graphs above
Would indicate that 41% of the dependent variance of distance travelled is explained by the independent variance of calories
Would indicate that 35.1% of the dependent variance of steps taken can be explained by the independent variance of calories
Analysis of the variables sedentary mins and calories shows a negative linear relationship which is expected as the longer time spend being sedentary means less calories are burnt.
However the correlation coefficient between distance and calories is at -0.1117, which indicates a small correlation between the 2 variables.
p-value is less than 0.05 which means the results are statistically significant
The R-squared values in the graph above
Would indicate that only 1.2% of the dependent variance of sedentary minutes can be explained by the independent variance of calories.
Sedentary time might not imply a unhealthy lifestyle but perhaps time taken to sleep or seating down to work. Thus this 2 variables do not have a strong correlation with one other.
We want to see if there is a relationship between the variables total active mins and time to fall asleep as based on earlier graphs, it shows that users who are more active fall asleep faster, hence a negative linear relationship is expected.
Based on the slope equation above
To find out the minutes to fall asleep based on active minutes : -0.046663 times (active mins) + 5103081
Correlation coefficient between active minutes and time to sleep is at -0.092, which indicates a small correlation between the 2 variables.
p-value is more than 0.05 which means the results are statistically insignificant.
The R-squared values in the graph above
Would indicate that only 0.08% of the dependent variance of active minutes can be explained by the independent variance of time to fall asleep.
Total active time doesn't seem to have much of a relationship with time taken to fall asleep, perhaps there are more science behind the benefits of being active which cannot be explained by comparing 2 variables alone.
1) The sample size of 33 user data collected, showed that their activity levels still falls short of the recommended amount in terms of :
Number of steps taken
Amount of sleep
2) However analysis of the data supports the widely acknowledge medical fact that the more a user travels (distance and steps) the more calories they burnt.
3) The categories of Very Active, Moderately Active and Lightly Active groups clearly shows the marked difference in calories burnt
4) The higher the intensity of exercise or even any form of exercise do translates to more productive time in bed. (referring to the time to fall asleep and not frisky behavior). Better sleep leads to improve overall health thus this is an important factor that Bellabeat can tap on in their marketing campaigns moving forward.
1) Based on a 2016 Gartner survey on wearable devices, the abandon rate for smartwatches is @ 29% and 30% for fitness trackers as users tend to find them not useful to their expectations, bored of them or broken easily.
2) Adoption of user friendly accessible menus allowing the customization of users' devices notifications according to different levels of physical activity milestones.
3) Introduce products that are more sporty stylish looking and or with improved durability (e.g waterproofing) and accuracy in addition to Bellabeat's existing range of products that are currently more suited for everyday wear than sports use.
4) Priced significantly lower than top brands, but still offering value for money targeted at price-sensitive consumers. Bellabeat might consider accepting lower margins but higher sales volume as we are competing against stronger and more established brands worldwide.
5) Formation of an online Bellabeat community for fellow fitness trackers / enthusiasts to support or encourage one another.
6) Bellabeat can engage users with incentives like shopping vouchers and gamification through points, badges, leaderboards and avatars.
7) Increasing the battery life and allowing charging of devices via body heat, solar energy and movement etc.
1) Bellabeat can work with government boards (e.g Health Promotion Boards) and Insurance companies on roadshows at locations like shopping malls to promote the benefits of a healthy lifestyle and population which in turn leads to lower insurance claims and medical expenses.
2) Bellabeat can also work with gyms and fitness influencers to promote their products and cause.
3) Tap on holidays or shopping seasons like Singles Day or Black Friday events to boost sales.
4) Emphasis on the proven analysis results on exercising and improved sleep which can lead to overall health benefits.
5) At the same time, Bellabeat must not forget to emphasis on data security and privacy for their users.