How can a wellness company play it smart?
Parikshit C Mukerjee
Bellabeat, a wellness company, is interested in how consumers use their wearable technology, specifically Fitbit trackers.
My analysis will focus on behavioral analytics of Fitbit users and accordingly inform future marketing strategies for Bellabeat.
Analysis Overview
Ask: Define the business problem and ask questions the analysis will aim to answer.
Prepare: Identify, evaluate, and select the necessary data sources, in this case, Fitbit Fitness Tracker Data from Kaggle, supplied by Mobius.
Process: Clean and manipulate the data to make it suitable for analysis. This includes handling missing values, filtering irrelevant data, and merging additional datasets.
Analyze: Examine the data to find patterns, trends, and insights. This will involve calculating statistics, creating pivot tables, and using various other analytical techniques to understand user behaviors and preferences.
Share: Present findings in a comprehensive report that includes a summary of analysis, supporting visualizations, and key findings.
Act: Provide high-level recommendations based on said analysis to guide Bellabeat's marketing strategies.
Ask
Key Deliverable: The business task is to analyze smart device usage data to uncover trends and patterns, providing valuable consumer behavior insights to drive growth opportunities for Bellabeat | Fitbit Fitness Tracker Data on Kaggle |
Guiding Questions:
● What are some trends in smart device usage?
● How could these trends apply to Bellabeat customers?
● How could these trends help influence Bellabeat marketing strategy?
Key Stakeholder Considerations:
Urška Sršen (Co-founder and Chief Creative Officer): is interested in unlocking new growth opportunities for Bellabeat by understanding consumer behavior. The insights gained from the analysis will help in shaping the overall marketing strategy.
Sando Mur (Co-founder and Mathematician): is involved in strategic decisions based on the analysis outcomes. They may be particularly interested in any data-driven recommendations for product development or enhancement.
Bellabeat Marketing Analytics Team: will use the insights to inform future marketing campaigns. They may be particularly interested in any customer segmentation insights for better ads targeting.
Bellabeat Executive Team: The broader executive team will be interested in high-level recommendations for the company's overall direction and success.
Prepare
The data is sourced from Kaggle, a reputable platform for sharing and accessing datasets. Kaggle provides an organized environment for data science projects, ensuring transparency and accessibility. The data originates from responses provided by thirty eligible Fitbit users who consented to the submission of personal tracker data on metrics including physical activities, heart rate measurements, and sleep patterns. The dataset is under CCO: Public Domain by Mobius.
Below are further details describing the dataset on Kaggle.
"This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviours / preferences." - Fitbit Fitness Tracker Data on Kaggle
Data Exploration:
The dataset includes multiple files, each focusing on specific aspects of health and fitness.
To begin working, I upload all datasets to google sheets.
An exploratory scan of the data reveals:
- long and wide form organization across files
- activity intensity (distance & time), calories and steps data in daily, hourly and minute form.
- heart rate data
- weight data
- sleep data
Preliminary checks performed to ensure data format and data reliability as shown in supplemental sheet below.
Process
The dataset includes multiple files, each focusing on specific aspects of health and fitness.
The key data files I will be working with for my analysis include:
dailyActivity_merged.csv daily
Calories_merged.csv daily
Intensities_merged.csv
dailySteps_merged.csv
sleepDay_merged.csv
For Anaysis 1,
I will focus on the following ‘daily’ dataset.
DailyActivity_merged: which contains daily activity data from Fitbit users, with the following columns:
Id: An unique identifier for each user.
ActivityDate: The date of the recorded activities.
TotalSteps: The total number of steps taken by the user on that day.
Calories: The total calories burned by the user on that day.
TotalDistance: The total distance covered by the user on that day, measured in units (likely kilometers or miles).
TrackerDistance: The distance tracked by the device.
LoggedActivitiesDistance: The distance for activities specifically logged by the user (as opposed to automatically detected by the device).
VeryActiveDistance: The distance covered during very active minutes.
ModeratelyActiveDistance: The distance covered during moderately active minutes.
LightActiveDistance: The distance covered during light activity.
SedentaryActiveDistance: The distance covered during sedentary activity.
VeryActiveMinutes: Minutes spent in very active activity.
FairlyActiveMinutes: Minutes spent in fairly active activity.
LightlyActiveMinutes: Minutes spent in lightly active activity.
SedentaryMinutes: Minutes spent in sedentary activity.
To confirm the number of users, I used the CountUnique function - which revealed 33 users in the dataset. This was an interesting find, as it appears to be three more than the Kaggle information. This brings forth a challenge as to whether the additional three members consented to their data being shared.
To confirm the data collection timeline, I used the sort and filter option to review the dates. The filtering shows the DailyActivity_merged data was collected on days ranging from 3/12/2016 to 5/12/2016 - which aligns with the Kaggle information.
Within the daily datasets, there were also individual long-format files for
dailyCalories_merged
dailyIntensities_merged
dailySteps_merged
To further check for data integrity in DailyActivity_merged, the primary file I will be working with, I manually re-recreated the DailyActivity_merged dataset by merging the above long-format files together and then comparing the two.
Findings: There were minor variations between the two datasets, which appears to be from rounding off in the DailyActivity_merged file. This adds further reliability to the DailyActivity_merged dataset. (Analysis to be continued on, Analysis 1: Introduction)
For Anaysis 2,
I will focus on the following ‘daily’ dataset.
sleepDay_merged: contains daily sleep data from Fitbit users, with the following columns:
Id: An unique identifier for each user.
SleepDay: The date of the recorded activities.
TotalSleepRecords: Count of sleep activity recorded.
TotalMinutesAsleep: Minutes spent sleeping.
TotalTimeInBed: Minutes spent sleeping and lying awake.
To confirm the number of users, I used the CountUnique function - which revealed 24 users in the dataset. This reveals there are fewer users with sleep records than daily activity records, suggesting that some users may be not be engaging consistently with the sleep features or perhaps taking off their devices altogether, during sleep hours.
Performed data cleansing for both datasets, as shown in Process - Supplemental sheet below.
Analysis 1 - Introduction
For Analysis 1, I will focus on the DailyActivity_merged dataset.
Given the DailyActivity_merged dataset shows each day’s data for a particular user for:
- Steps Covered
- Active Distance Covered
- Active Minutes Spent
- Calories Burnt
and within these categories, the Active Distance and Active Minutes data are further broken down into:
- Active Distance Covered
- LightActiveDistance
- ModerateActiveDistance
- VeryActiveDistance
- Active Minutes Spent
- LightActiveMins
- ModerateActiveMins
I plan to explore the following directions of analysis, in alignment with the business task: Understanding how consumers use their smart devices to inform future marketing strategies for Bellabeat.
Correlation Analysis
Analysis 1A
Explore correlations between different variables, such as the relationship between steps and calories burned, to gain preliminary insights into the relationships between the different metrics.
User Behavior Analysis
Analysis 1B
Activity Patterns: Explore the daily patterns in total steps, minutes, and calories burned to understand when users are most and least active.
Analysis 1C
Activity Types: Explore the time spent in different activity intensities (very active, moderately active, lightly active, sedentary) to understand the users' lifestyle and fitness levels.
Temporal Analysis
Analysis 1D
Trend Analysis: Explore trends over time to see if activity levels are increasing, decreasing, or remaining stable.
Analysis 1E
Time of Day Analysis: Dive deeper into when users are most active during the day.
Segmentation Analysis
Analysis 1F
User Segmentation: Based on activity, segment users into categories such as 'Highly Active', 'Moderately Active', and 'Barely Active' to explore segment distributions.
Analysis 1A - Correlation Analysis
Description:
This analysis will explore correlations between different variables, such as the relationship between steps and calories burned, or the impact of various activity intensities on total distance covered etc. Understanding these correlations can provide preliminary insights into the relationships between the different metrics.
Data Manipulation Log:
Using the CORREL function, I examined the correlation coefficients for several pairs of variables, including:
Steps and Calories
Steps and Total Distance
Calories and Total Distance
Calories and different Activity Distance levels
Very Active Distance and Very Active Minutes
Correlation Coefficients:
Steps:Calories - 0.571322648
Steps:TotalDistance - 0.9869796408
Calories:TotalDistance - 0.6393753377
Calories:LightDistance - 0.5286872061
Calories:VeryActiveDistance - 0.5111155465
VeryActiveDistance:VeryActiveMins - 0.8937687248
Observations:
There appears to be a high correlation (0.9869796408) between steps taken and total distance covered, indicating that users who take more steps tend to cover longer distances as expected. (See Figure 1)
The correlation coefficient between steps and calories (0.571322648) suggests a moderate positive correlation between steps taken and calories burned, suggesting that higher step counts is a moderate indicator for increased calorie expenditure. (See Figure 2)
The correlation coefficient between very active distance and very active minutes (0.8937687248) indicates a strong positive correlation, suggesting that users who engage in more very active minutes also cover longer distances at a very active intensity.
Interestingly, the correlation coefficient between Calories burned and light distance (0.5286872061), is slightly stronger than the correlation coefficient between Calories burned and VeryActiveDistance (0.5111155465). This suggests walking distances in a lighter pace can be as effective in promoting calorie expenditure as walking with more intensity. (See Figures 3 & 4)
Figure 1
Figure 2
Figure 3
Figure 4
Analysis 1B - Activity Pattern Analysis
Description: This analysis will explore the daily patterns for total steps, calories burned, and activity minutes spent to gain preliminary insights into the users' activity habits.
Data Manipulation Log:
Created a new column titled “DayofWeek” to convert the date into days using the formula =TEXT(Cell, “DDD”).
Created a Pivot Table to summarize the data by activity minutes type.
Added DayofWeek to the Rows to analyze daily patterns without specific user IDs.
Added VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, and SedentaryMinutes to the Values section, setting them to sum.
Observations:
Tuesday exhibits the highest levels of activity, with users recording the most total steps, calories burned, and active minutes. However, this day also sees the highest sedentary minutes, indicating a need to manage sedentary behavior despite high activity levels. (See Figures 5-8)
Following Tuesday, Wednesday and Thursday show sustained high activity levels, aligning with the typical 9-5 workweek. Users record over a million steps on both these days, suggesting good engagement in physical activities during workdays. (See Figures 5-8)
Saturdays and Sundays show lower total steps, calories burned, and active minutes compared to weekdays, indicating a more relaxed and leisurely tendency over the weekend. (See Figures 5-8)
Recommendations:
Given Tuesdays, Wednesdays, and Thursdays record high activity, these offer the optimal days for engaging users, for instance, through motivational messages or challenges to build on their already active routines. At the same time, these days also record the highest sedentary behavior, which can be targeted in parallel with such engagement strategies. For instance, we can promote consistent movements by introducing hourly stretching prompts targeted on these days.
Since Weekends show decreased activity, engagement strategies on Saturdays and Sundays can be aligned to promote lighter or more recreational activities to help users recover from their hectic work week while simultaneously limiting the tendency for high sedentary behavior.
Figure 5
Figure 6
Figure 7
Figure 8
Analysis 1C - Activity Type Analysis
Description: This analysis will examine the distribution of time spent in different activity minutes (very active, moderately active, lightly active, sedentary) to gain further insights into users' lifestyle and fitness levels.
Data Manipulation Log:
Created a new column titled “DayofWeek” to convert the date into days using the formula =TEXT(Cell, “DDD”).
Created a Pivot Table to summarize the data by day of the week.
Dragged DayofWeek to the Rows area and set it to display the day of the week.
Added TotalSteps, VeryActiveMinutes + FairlyActiveMinutes + LightlyActiveMinutes (for total active minutes), and Calories to the Values area, setting them to sum.
Observations:
Sedentary activities are the most prevalent by far, followed by Lightly Active Minutes. The consistent presence of Lightly Active Minutes across all days suggests a tendency towards light-intensity activities among the user base. (See Figure 9)
Tuesday exhibits the highest total active minutes, followed by Wednesday and Monday which corroborates with prior analysis, suggesting that the beginning of the week is generally more active and aligned with a typical work week. (See Figure 10)
While Tuesdays and Wednesdays have relatively higher active minutes, they also exhibit higher overall sedentary time, which may indicate a prolonged inactivity in between bursts of activity. (See Figure 9)
Recommendations:
The observations underscore a significant opportunity to engage users in reducing their sedentary behavior while also pushing them up to more vigorous physical activities from moderate activities. Personalizing engagement strategies, such as initiating vigorous challenges from Monday to Wednesdays while incorporating light, fun activities during weekends, could effectively leverage the observed behavioral patterns to promote more vigorous activity. In conjunction, implementing targeted messages aimed at reducing sedentary behavior during weekdays may help to limit inactivity.
Figure 9
Figure 10
Analysis 1D - Trend Analysis
Description: This analysis will examine trends over time to assess whether activity levels are increasing, decreasing, or remaining stable.
Data Manipulation Log:
Sorted ActivityDate in ascending order to accurately visualize trends over time.
Created a Pivot Table to summarize the data by ActivityDate.
Dragged ActivityDate to the Rows area and set it to display the day of the week.
Added TotalSteps and Calories to the Values area, setting them to sum for over time analysis.
Observations:
The results depict fluctuating activity levels rather than a clear, consistent upward or downward trend. Spikes in TotalSteps on certain days (e.g., 16-Apr and 23-Apr) are followed by drops on other days (e.g., 17-Apr and 1-May), indicating wide variability in the user base. LightlyActiveMinutes appear to correlate with the fluctuations in TotalSteps, suggesting that user steps are largely accounted for during LightlyActive engagement. (See Figure 11 & 13)
Despite fluctuations in Active Minutes and Total Steps, SedentaryMinutes remain relatively stable, with minor fluctuations. This stability in sedentary behavior underscores prior opportunities for targeted interventions to promote a more active lifestyle. (See Figure 14)
Similar to Sedentary Minutes, FairlyActiveMinutes and VeryActiveMinutes also demonstrate some degree of stability, suggesting low but consistent engagement in vigorous activities. (See Figure 13)
Recommendations:
The data reveals fluctuating activity levels among users, with notable spikes in total steps and lightly active minutes on certain days while displaying a relatively stable trend for sedentary minutes. Investigating the spikes in activity on specific days further may offer more insights into user motivation, in turn allowing Bellabeat to promote such spikes on other days as well. Given the varying engagement in light activities, but the relatively low yet stable engagement in fairly active and very active minutes, Bellabeat can leverage strategies that may gradually introduce and sustain more vigorous activities, for instance through personalized challenges, to help users' transition into more active lifestyles.
Figure 11
Figure 12
Figure 13
Figure 14
Analysis 1E - Time of Day Analysis
Description: This analysis will explore when users are most and least active throughout the day to explore optimizing the timing of engagement strategies.
Data Manipulation Log:
Merged hourly datasets for TotalSteps, CaloriesBurned, TotalIntensity, and AverageIntensity.
Grouped hourly dates into a 12-hour format.
Sorted ActivityDate in ascending order to accurately visualize the trend over time.
Created a Pivot Table to summarize the data by ActivityHour.
Dragged ActivityHour to the Rows area and set it to display the Time of Day (12HR Format).
Added TotalSteps, CaloriesBurned, TotalIntensity, and AverageIntensity to the Values area, setting them to sum.
Observations:
Peak Hours (See Figure 15 & 16)
Morning Activity: A notable increase in activity is observed starting from 6 AM, with a significant peak around 7-10 AM. This suggests that many users are most active in the early morning, potentially engaging in morning workouts or active commutes.
Evening Activity: Another peak in activity is observed from 5 PM to 7 PM, indicating that users also engage in physical activities after work or in the evening.
Intensity: Trends in total intensity closely mirror step counts, indicating that peak times are not only more active in terms of steps but also in intensity.
Low Activity Periods (See Figure 15 & 16)
Midday Dip: A noticeable dip in activity occurs around midday from 1 PM to 3 PM, possibly corresponding to lunchtime or a general midday lull.
Late Night to Early Morning: The lowest activity levels are consistently seen from 12 AM to 5 AM, aligning with a typical sleeping period.
Recommendations:
Morning Engagement: Given high activity levels in the early morning, sending motivational messages or challenges just before this time could encourage users to start and sustain their days actively.
Evening Challenges: The evening activity peak presents another opportunity for engagement, prompting users to participate in activities or challenges to stay active after work.
Midday Motivation: Utilize the midday dip in activity as an optimal time to remind users to take short active breaks, like stretching, to leverage and counteract the lull in activity, promoting better energy levels for the rest of the day.
Figure 15
Figure 16
Analysis 1F - Segmentation Analysis
Description:
This analysis will segment users based on their activity levels, specifically their total steps, into categories of 'Highly Active', 'Moderately Active', and 'Barely Active', in alignment with National Health Research (NIH).
Data Manipulation Log:
Created a Pivot Table with Id as Rows.
Added TotalSteps to the Values area, setting them to calculate the sum (as each row represents a different user's total activity over the 61-day period).
In alignment with the scientific article published by the National Institutes of Health (NIH) Research, which suggests the recommended number of steps per day to reduce risk of death over the following decade, I used the following segmentation criteria:
Highly Active: Users with a total step count of 10,000 or more steps per day over a 61-day period (> 610,000 steps).
Moderately Active: Users with a total step count between 5,000 and 10,000 steps per day over a 61-day period (305,000 steps to 610,000 steps).
Barely Active: Users with a total step count of 5,000 or less steps per day over a 61-day period (≤ 305,000 steps).
Created New Column titled, UserSegment with following formula =IF(C3 >= 610000, "Highly Active", IF(C3 > 305000, "Moderately Active", "Barely Active"))
Created Pie Chart to visualize segments.
Observations:
A significant portion (81.8%) of the dataset falls into the 'Barely Active' segment, raising concerns that most users are not meeting the recommended activity levels for a healthy lifestyle. (See Figure 17)
Only a small percentage (18.2%) of users have been classified as 'Moderately Active', suggesting that while some individuals engage in regular physical activity, they too may not be reaching higher activity benchmarks. (See Figure 17)
There are no users classified as 'Highly Active', indicating an opportunity to support users in achieving higher levels of physical activity.
Recommendations:
Given the majority of the user base falls into the 'Barely Active' segment, this segment may benefit from tailored engagement strategies that offer achievable goals every day to instill consistency first, while 'Moderately Active' users may respond better to more challenging goals to push them up towards the 'Highly Active' segment.
The absence of Highly Active users alongside the abundance of BarelyActive users, underscores the need to proactively educate our userbase on the health benefits of regular physical activity and recommended benchmarks - as suggested by National Institutes of Health for instance. Therefore, incorporating educational content and offering tips for increasing daily step counts should be a key recommendation to leverage, especially for the 'Barely Active' users.
Further research is needed to understand the barriers faced by 'Barely Active' users and the motivations of 'Moderately Active' users to inform more effective engagement strategies and product enhancements.
Figure 17
Analysis 2 - Introduction
For Analysis 2, I will focus on the dataset.
sleepDay_merged: contains daily sleep data from Fitbit users, with the following columns:
Id: An unique identifier for each user.
SleepDay: The date of the recorded activities.
TotalSleepRecords: Count of sleep activity recorded.
TotalMinutesAsleep: Minutes spent sleeping.
TotalTimeInBed: Minutes spent sleeping and lying awake.
Based on above data, I will undertake the following analyses to understand patterns and insights related to users' sleep habits.
Analysis 2A
Exploring Average Sleep Durations: This analysis will determine the average sleep duration of users in the dataset and identify individuals who may not be meeting the recommended 7-9 hours of sleep per night, as advised by the National Sleep Foundation.
Analysis 2B
Time in Bed vs. Actual Sleep: This analysis will compare TotalTimeInBed with TotalMinutesAsleep for insights into users' sleep patterns and identify potential issues related to falling asleep.
Analysis 2C
Sleep Consistency: This analysis will compare the variability in TotalMinutesAsleep and TotalTimeInBed for each user to assess sleep consistency. High variability may indicate irregular sleep patterns, potentially affecting sleep quality.
Analysis 2D
Sleep Record Frequency: This analysis will explore the frequency of sleep records for our sample to gain insights into users' sleep patterns. Multiple records, for instance, might indicate users experience sleep disruptions.
Analysis 2E
Sleep Day Analysis: This analysis will explore sleep data trends over a specified date range (SleepDay) to uncover patterns over specific days of the week. By analyzing sleep duration across different days of the week, insights can be gained into sleep behaviors, including variations between weekdays and weekends.
Analysis 2A - Average Sleep Durations
Description:
This analysis will determine the average sleep duration of users in the dataset and identify individuals who may not be meeting the recommended 7-9 hours of sleep per night, as advised by the National Sleep Foundation.
Data Manipulation Log:
Created a Pivot Table to summarize the data by UserId.
Added UserId to the Rows section.
Added TotalMinutesAsleep to the Values section, setting it to Average.
Added TotalSleepRecords to the Values section, setting it to CountA
Calculated AverageHoursAsleep by dividing the Average TotalMinutesAsleep by 60.
Used the COUNTIF function to categorize users into those getting less than 7 hours and those getting more than 7 hours of sleep per night.
Created a Pie Chart to visualize the proportion of users meeting the recommended sleep duration versus those who are not.
Observations:
The analysis reveals that out of the 24 users in the dataset, the average sleep duration ranges from as low as 1.02 hours to as high as 10.87 hours. However, it's important to note that the average sleep duration for some users may be skewed due to the low number of sleep records. For instance, User ID 2320127002 has only one sleep record, resulting in their average sleep duration of 1.02 hours.
Given that 50% of the users are not meeting the recommended sleep duration, there is opportunity to improve overall sleep patterns in the user-base. As mentioned also, the dataset comprises of a low sample size of 24 users, indicating that fewer individuals are using the sleep tracking feature. (See Figure 18)
Recommendations:
Given the low number of records for several users, it is important to be cautious when discussing insights as the sample size may not be representative of Bellabeat's userbase. Within said context, Bellabeat could encourage more users to engage with the sleep tracking feature by implementing targeted marketing campaigns which may educate on the importance of adequate sleep for overall health and well-being, while tying back those metrics to Bellabeat's sleep tracking features. Lastly, implementing reminders or notifications to encourage users to log their sleep regularly could also help improve future data reliability.
Figure 18
Analysis 2B - Time In Bed vs Time Asleep
Description:
This analysis will compare TotalTimeInBed with TotalMinutesAsleep for insights into users' sleep patterns and identify potential issues related to falling asleep.
Data Manipulation Log:
Correlation Analysis:
Ran a correlation analysis between TotalTimeInBed and TotalMinutesAsleep to assess the strength of their relationship.
Used the correlation function =CORREL(range1, range2), where range 1 represents TotalTimeInBed and range 2 represents TotalMinutesAsleep.
Average Time Awake in Bed:
Created a new column titled "TotalMinutesAwake," representing the difference between TotalTimeInBed and TotalMinutesAsleep.
Ran summary statistics on TotalMinutesAwake to understand distribution and determine the average duration users spend in bed without sleeping.
Observations:
Correlation Graph:
A strong correlation of 0.93 indicates a significant relationship between TotalTimeInBed and TotalMinutesAsleep. In other words, the closer the data points are to the y = x line on the correlation graph, the less time users spend awake in bed. (See Figure 19)
Summary Statistics Total Minutes Awake:
There wide variability in TotalMinutesAwake, suggesting differences in users' sleep quality. On average, users spend approximately 39 minutes awake in bed. Three peaks could be interpreted, suggesting three distinct segments of users experiencing sleep latency for approx 300mins, 200mins and 100mins respectively. (See Figure 20)
Recommendations:
The observed discrepancies between time in bed and sleep time may offer opportunities to introduce new features aimed at reducing sleep latency. For instance, this could include guided relaxation techniques or ways to optimize the sleep environment.
By leveraging insights from the three segments, Bellabeat can develop strategies to personalize recommendations, like targeted content delivery. For example, the 300mins sleep latency segment of users maybe experiencing different sleep-latency reasons than those in the 200 or 100mins segment. Tailoring recommendations may thus be more effective in improving sleep time for each segment.
Figure 19
Figure 20
Analysis 2C - Sleep Consistency
Description:
This analysis will compare the variability in TotalMinutesAsleep and TotalTimeInBed for each user to assess sleep consistency. High variability may indicate irregular sleep patterns, potentially affecting sleep quality.
Data Manipulation Log:
Created a Pivot Table with UserID in the Rows section.
Added TotalMinutesAsleep and TotalTimeInBed to the Values section, setting them to STDEV.P to calculate the standard deviation.
Observations:
Users with low standard deviation values (e.g., 1844505072, 2320127002) seem to exhibit consistent sleep patterns with minimal day-to-day variation in both TotalMinutesAsleep and TotalTimeInBed. (See Figure 21)
Users like 1644430081 or 4319703577 show a high standard deviation in TotalMinutesAsleep (289.90, 112.07), revealing significant variability in nightly sleep duration. This variability may stem from challenges in maintaining a consistent sleep schedule, potential sleep disruptions, or lifestyle factors. (See Figure 21)
Recommendations:
High variability amongst many users, offers an opportunity to introduce feedback features that could perhaps help alert users of significant variability in their sleep patterns, prompting them to take more proactive steps to improve sleep consistency.
Further investigation into the reasons behind inconsistent sleep patterns such as stress, irregular work hours, or sleep disorders will be helpful in offering tailored messaging to promote good sleep habits, like limiting screen time before bed, promoting relaxation techniques etc.
Further investigation into the low variability group may also reveal successful sleep strategies being used, to be shared with others.
Figure 21
Analysis 2D - Sleep Record Frequency
Description:
This analysis will explore the frequency of sleep records for our sample to gain insights into users' sleep patterns. Multiple records, for instance, might indicate that users are experiencing sleep disruptions.
Data Manipulation Log:
Created a pivot table with UserID in the Rows field and TotalSleepRecords in the Values field.
Set TotalSleepRecords to summarize by AVERAGE and COUNTA to determine the average sleep records per day and the number of days data was collected for each user, respectively.
Added a new column titled Sleep Record Category to categorize users based on their average sleep records per day.
Use an IF statement to classify users as having "Single Record," or "Multiple Records", based on their average sleep records.
Created a Pie Chart to visualize the distribution of sleep record categories.
Observations:
50% of users have an average of 1 sleep record per day, indicating stable sleep patterns. Having said that, many single sleep records also have low number of sleep day counts suggesting this may not provide sufficient insight into their regular sleep patterns. In contrast, users with consistent higher counts may be more reliable for analysis and recommendations. (See Figure 22)
The other 50% of users exhibit more than 1 sleep record on average per day, suggesting possible sleep disruptions or napping habits. Having said that, the highest average sleep record per day observed is 1.6, indicating a relatively low frequency of sleep disruptions. (See Figure 22)
Recommendations:
A key recommendation, given the low count of records, is to encourage more regular sleep tracking for personalized insights and recommendations.
To foster a habit of regular sleep tracking, it maybe is helpful to explore the following three strategies to incentivize and motivate users to engage with the sleep tracking feature more consistently, thereby providing more data for future analysis.
Gamification: Implement gamification elements such as badges, rewards, or challenges to make sleep tracking more interactive and enjoyable. Users can earn rewards or compete with friends to encourage adherence to sleep tracking goals.
Reminders and Notifications: Send regular reminders or notifications to prompt users to log their sleep data. Customizable reminders can be tailored to users' preferred bedtime or wake-up times, increasing the likelihood of consistent tracking.
Community Engagement: Foster a sense of community by allowing users to share their sleep experiences, tips, and challenges with each other. Peer support and social interaction can motivate users to stay engaged with the sleep tracking feature.
Figure 22
Analysis 2E - Sleep Day Analysis
Sleep Day Analysis: This analysis will explore sleep data trends over a specified date range (SleepDay) to uncover patterns over specific dats of the week. By analyzing sleep duration across different days of the week, insights can be gained into sleep behaviors, including variations between weekdays and weekends.
Data Manipulation Log:
Created a new column titled DayofTheWeek and used the formula =TEXT(B2,"ddd") to convert dates into days of the week.
Created a Pivot Table, with the Rows as the DayoftheWeek and values being TotalTimeinBed and TotalTimeAsleep, summarizing by AVERAGE.
Converted minutes to hours by dividing the results in each column by 60 to generate AverageTimeinBed and AverageTimeAsleep in Hours.
Compiled the summarized data into a final table, in order of the week, for DayofTheWeek analysis.
Observations:
Increased Sleep Duration on Weekends: The analysis reveals a noticeable increase in both time spent in bed and actual sleep time on Sundays and Saturdays compared to Weekday average. In fact, Sunday shows the highest average TotalTimeinBed (8.39 hours) and TotalTimeAsleep (7.55 hours) among all days of the week. (Figure 24)
Mid-Week Improvement: Wednesdays also exhibit a peak in sleep duration, with the average TotalTimeinBed reaching approximately 7.83 hours and TotalTimeAsleep averaging about 7.24 hours. This mid-week peak might suggest a recovery or a 'push-back' phase from the start and end of the workweek which record average TimeAsleep of under 7 hours. (Figure 24)
Lower Sleep Duration Mid-Week: Thursday records the lowest average TotalTimeinBed (7.26 hours) and TotalTimeAsleep (6.71 hours) among the weekdays, suggesting a dip in sleep duration, and perhaps accumulated fatigue, towards the end of the workweek. (Figure 24)
Overall Sleep Deficit: Comparing the average sleep durations to the recommended 7-9 hours per night highlights a general sleep deficit during the weekdays. As mentioned prior, all weekdays except Wednesday, falls short of the recommended duration, with Tuesday and Thursday showing notably lower averages. (Figure 24)
Recommendations:
Given the SleepDay trends, Bellabeat can refine their engagement strategy specific to the days of the week. For instance, on Monday, Tuesdays, Thursday and Fridays - we can promote better sleep habits through alerts and notifications to encourage sleep-inducing bedtime routines.
Likewise, Weekends and Wednesdays are a good opportunity to engage users in enhancing and optimizing their significant increase in sleep durations observed. As an example, we can introduce weekend-exclusive meditation sessions or do-not-disturb features to promote deeper sleep on Weekends, when users are likely to have extended sleep periods.
Figure 24
Analysis 3 - Introduction
For Analysis 3, I will merge the individual datasets used in the prior two analyses, using UserId or ActivityDay as the common key, and explore any correlations between physical activity levels and corresponding sleep activity patterns.
To revisit,
Analysis 1 employed the following dataset:
DailyActivity_merged: contains daily activity data from Fitbit users, with the following columns:
Id: An unique identifier for each user.
ActivityDate: The date of the recorded activities.
TotalSteps: The total number of steps taken by the user on that day.
Calories: The total calories burned by the user on that day.
TotalDistance: The total distance covered by the user on that day, measured in units (likely kilometers or miles).
TrackerDistance: The distance tracked by the device.
LoggedActivitiesDistance: The distance for activities specifically logged by the user (as opposed to automatically detected by the device).
VeryActiveDistance: The distance covered during very active minutes.
ModeratelyActiveDistance: The distance covered during moderately active minutes.
LightActiveDistance: The distance covered during light activity.
SedentaryActiveDistance: The distance covered during sedentary activity.
VeryActiveMinutes: Minutes spent in very active activity.
FairlyActiveMinutes: Minutes spent in fairly active activity.
LightlyActiveMinutes: Minutes spent in lightly active activity.
SedentaryMinutes: Minutes spent in sedentary activity.
and Analysis 2 employed the following dataset:
SleepDay_merged: contains daily sleep data from Fitbit users, with the following columns:
Id: An unique identifier for each user.
SleepDay: The date of the recorded activities.
TotalSleepRecords: Count of sleep activity recorded.
TotalMinutesAsleep: Minutes spent sleeping.
TotalTimeInBed: Minutes spent sleeping and lying awake.
To merge the above datasets by date, the SleepDay format '4/13/2016 12:00:00 AM' was transformed using the "Convert to Text" split function, with space as the delimiter and the subsequent Timestamp Columns 12:00:00 & AM removed - to align with the ActivityDate format of '4/13/2016'
To merge the above datasets by UserID, users that recorded ActivityData but did not record any corresponding SleepData, were removed. Although removing such incomplete data may improve reliability on insights derived from subsequent analyses of Activity and Sleep patterns, the low number of records (24 users), highlights caution when extracting insights as the sample size may not be representative of Bellabeat's userbase.
Analysis 3A - Activity and Sleep Data Correlation Analysis
Correlation Analysis: This analysis will explore the relationship between physical activity (e.g., SedentaryMinutes, VeryActiveMinutes) and sleep patterns (e.g., TotalTimeAsleep, TotalTimeAwake).
Data Manipulation Log:
Created a Pivot Table to merge the datasets, by ActivityDate in the Rows and values being MinutesActivity Data and MinutesSleep Data, summarizing by AVERAGE.
Compiled the summarized data into a final table, displaying a Correlation matrix.
Observations:
Sedentary Minutes vs. Total Minutes Asleep (-0.252): There is a weak negative correlation between sedentary minutes and total minutes asleep, suggesting that as sedentary behavior increases, total sleep duration tends to decrease slightly.
Lightly Active Minutes vs. Total Minutes Asleep (-0.238): There is a weak negative correlation between lightly active minutes and total minutes asleep, suggesting a slight decrease in total sleep duration with higher levels of lightly active minutes.
Fairly Active Minutes vs. Total Minutes Asleep (-0.236): There is a weak negative correlation between fairly active minutes and total minutes asleep, highly comparable to LightlyActive minutes.
Very Active Minutes vs. Total Minutes Asleep (-0.345): There is a moderate negative correlation between very active minutes and total minutes asleep. This indicates a stronger negative relationship compared to lightly and fairly active minutes, suggesting that higher levels of very active minutes are associated with shorter total sleep durations.
Recommendations:
Given the potentiality of decreased sedentary time on improving total minutes asleep, there is an opportunity to improve sleep time by encouraging users to reduce sedentary behavior during the day through reminders to stand up, stretch, or take short walks. For instance, Bellabeat can incentivize users to engage in light to moderate physical activities throughout the day, aiming to optimize sleep quality.
It's important to note that correlation does not imply causation. While physical activity may influence sleep patterns, all correlations are relatively weak, suggesting that other factors may also influence sleep patterns. Other factors such as stress, lifestyle, and environmental factors can also play a crucial role. Therefore, a further analysis on a larger sample size may be needed to better explore the factors influencing sleep quality.
Figure 25
Analysis 3B - Activity and Sleep Temporal Analysis
Temporal Analysis: This analysis will explore trends in physical activity levels and sleep metrics over time to identify any temporal trends between physical activity and sleep behaviors.
Data Manipulation Steps:
Created a Pivot Table to merge the datasets, by ActivityDate in the Rows and values being MinutesActivity Data and MinutesSleep Data, summarizing by AVERAGE.
Created line graph to visualize trends.
Observations:
The Sedentary mins trend at times exhibits a negative correlation with sleep duration, corroborating with prior analysis that increased sedentary behavior may be associated with shorter sleep durations. (Figure 26)
The Light activity minutes line-graph, at times, exhibits a positive correlation with sleep duration line-graph, suggesting that higher levels of light activity may contribute to longer sleep duration. (Figure 26)
While there are some correlations between activity and sleep metrics on a daily basis, for the most part, the relationships are not consistent and are likely being influenced by various other factors. Further data is needed to provide deeper insights into the interaction between sleep and activity metrics. (Figure 26)
Recommendations:
Given the need for further data, Bellabeat can consider exploring data tracking for other factors which are known to significantly impact sleep quality such as diet, stress levels, and sleep environment. Keeping track of these variables alongside the activity levels can offer better clarity into factors influencing sleep patterns.
Figure 26
Share
Having conducted comprehensive analyses from the Fitbit userbase and gaining several insights from each, in this final stage, we present the consolidated set of recommendations for Bellabeat stakeholders. These recommendations are aligned with the overarching business objective, to offer insights into consumer behavior and uncover growth opportunities for Bellabeat.
Engagement Strategies for Optimal Days: Bellabeat should target Tuesdays, Wednesdays, and Thursdays for engagement with motivational messages or challenges to capitalize on moderate activity levels. Similarly, Belleabeat should address the worryingly high sedentary behavior observed in the userbase by promoting consistent movements, through hourly stretching prompts throughout the 9-5 workday. On weekends, Bellbeat should shift its strategy to focus on promoting lighter or recreational activities to aid recovery from the prior week while still limiting sedentary behavior. (See Analysis 1B to 1E, 2E)
Promoting Sleep Tracking: Bellabeat should encourage more regular sleep tracking to improve personalized insights and recommendations. We can implement strategies like gamification elements, alert reminders, and community engagement elements to incentivize users to consistently use the sleep tracking feature by wearing their fitbits to bed, thus enhancing data reliability and enabling tailored sleep-related recommendations. (See Analysis 2A, 2D, 3B)
Managing Physical Activity to Improve Sleep Quality: By leveraging the potential correlation between decreased sedentary time and improved sleep duration, we can encourage users to reduce sedentary behavior during the day through reminders for standing up, stretching, or taking short walks, in turn improving sleep quality. While acknowledging the weak correlations between physical activity and sleep patterns, Bellabeat may also consider further sleep factors tracking to gain more refined insights. (See Analysis 3A, 3B)
Thank you for taking the time to review my Data Analytics Case Study.
Your engagement is greatly appreciated!
If you have any questions, or are seeking guidance in your professional endeavours, or just want to chat, please do not hesitate to reach out.
I would love to support and assist you in any way I can.