Bellabeat is a high-tech manufactuer that focus primarily on products for women. This is a small owned company that has the potential of being a much larger player in the global smart device market. In order to better understand how their device is utilized and enhance sells, their users data usage will be analyzed. There was a process used to obtain our findings. Asking detailed questions, Preparing, Processing, Analyzing, Sharing, and Acting on the data. Findings show that users who get more sleep were able to to walk more and faster, resulting in burning more calories.
Ask
Questions for the analysis
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy
Identify the business task:
Analyze smart device usage data, in order to gain insight into how people already use their smart devices. Utilize this information to make a recommendation on how these trends can inform Bellabeat marketing strategy.
Consider key stakeholders:
The main stakeholders here are Urška Sršen, Bellabeat’s co-founder and Chief Creative Officer; Sando Mur, Mathematician and Bellabeat’s cofounder; and Bellabeat marketing analytics team.
The business task:
To ensure Bellabeat is providing their customers with complete useful utilization of their devices, obtained data will be utilized to improve devices and aid with marketing. All data will be throughly researched and processed through Google Sheets and Program R.
Prepare
Download data and store it appropriately:
1. The data will be downloaded and stored in Google Cloud. A lock will be placed on the data for security . Permissions will be created in order to access the data.
Identify how it's organized:
The table will be organized by daily habits. The frequency of the specific habit will be observed and analyzed.
Sort and filter the data:
Prior to sorting and filtering the data, the data will be cleaned first. Ensuring there aren't any duplicates or errors. The data will then be sorted by dates and frequency. Primary focus will be on data that is meaningful like the most consistent habits.
4. Determine the credibility of the data:
The data is collected from FitBit Fitness Tracker Data. The data comes from thirty FitBit users. Data is collected from the users behavior.
Process
Check the data for errors:
The data was checked for errors in Google Sheets. This was completed in the Prepare process.
Choose your tools:
The tools that were utilized was Program R.
Transform the data so you can work with it effectively:
R Program coding was conducted to transform the data.
Document the cleaning process:
Notes were taking during the process. All cleaning or manipulation of the data were noted.
Loading Package
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ ::() masks
ℹ Use the conflicted package to force all conflicts to become errors
Importing The Data
dailyActivity_merged <- read_csv("dailyActivity_merged.csv")
dailyCalories_merged <- read_csv("dailyCalories_merged.csv")
dailyIntensities_merged <- read_csv("dailyIntensities_merged.csv")
dailySteps_merged <- read_csv("dailySteps_merged.csv")
heartrate_seconds_merged <- read_csv("heartrate_seconds_merged.csv")
minuteSleep_merged <- read_csv("minuteSleep_merged.csv")
sleepDay_merged <- read_csv("sleepDay_merged.csv")
weightLogInfo_merged <- read_csv("weightLogInfo_merged.csv")
Merging The Data
> merge_1 <- merge(dailyActivity_merged, dailyCalories_merged, by = c("Id","Calories"))
> merge_1 <- merge(dailyActivity_merged, dailyCalories_merged, by = c("Id","Calories"))
>
> merge_2 <- merge(dailyIntensities_merged, dailyIntensities_merged, by = c("Id","ActivityDay","SedentaryMinutes", "LightlyActiveMinutes","FairlyActiveMinutes","VeryActiveMinutes", "SedentaryActiveDistance", "LightActiveDistance", "ModeratelyActiveDistance", "VeryActiveDistance"))
>
> merge_daily <- merge(merge_1, merge_2, by = c("Id","ActivityDay","SedentaryMinutes", "LightlyActiveMinutes","FairlyActiveMinutes","VeryActiveMinutes", "SedentaryActiveDistance", "LightActiveDistance", "ModeratelyActiveDistance", "VeryActiveDistance")) %>%
+ select(-ActivityDay) %>% rename(Date = ActivityDate)
> daily_data <- merge(merge_daily, sleepDay_merged, by = "Id",all=TRUE) %>% drop_na() %>% select(-SleepDay, -TrackerDistance)
> options(repr.plot.width=30)
Quick Review
> summary(daily_data)
Id SedentaryMinutes LightlyActiveMinutes FairlyActiveMinutes VeryActiveMinutes
Min. :1.504e+09 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.:4.020e+09 1st Qu.: 687.0 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.00
Median :4.703e+09 Median : 781.0 Median :171.0 Median : 3.00 Median : 0.00
Mean :5.117e+09 Mean : 938.6 Mean :156.4 Mean : 13.58 Mean : 18.76
3rd Qu.:6.962e+09 3rd Qu.:1440.0 3rd Qu.:240.0 3rd Qu.: 19.00 3rd Qu.: 28.00
Max. :8.792e+09 Max. :1440.0 Max. :518.0 Max. :143.00 Max. :210.00
SedentaryActiveDistance LightActiveDistance ModeratelyActiveDistance VeryActiveDistance Calories
Min. :0.0000000 Min. : 0.000 Min. :0.0000 Min. : 0.000 Min. : 0
1st Qu.:0.0000000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:1693
Median :0.0000000 Median : 2.860 Median :0.1100 Median : 0.000 Median :2013
Mean :0.0005276 Mean : 2.771 Mean :0.5729 Mean : 1.094 Mean :2220
3rd Qu.:0.0000000 3rd Qu.: 4.480 3rd Qu.:0.7900 3rd Qu.: 1.740 3rd Qu.:2643
Max. :0.1100000 Max. :10.300 Max. :6.4800 Max. :13.400 Max. :4900
Date TotalSteps TotalDistance LoggedActivitiesDistance TotalSleepRecords
Length:15901 Min. : 0 Min. : 0.000 Min. :0.00000 Min. :1.000
Class :character 1st Qu.: 0 1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.:1.000
Mode :character Median : 6393 Median : 4.480 Median :0.00000 Median :1.000
Mean : 6351 Mean : 4.487 Mean :0.09649 Mean :1.116
3rd Qu.:10460 3rd Qu.: 7.390 3rd Qu.:0.00000 3rd Qu.:1.000
Max. :22988 Max. :17.950 Max. :4.94214 Max. :3.000
TotalMinutesAsleep TotalTimeInBed
Min. : 58.0 Min. : 61.0
1st Qu.:360.0 1st Qu.:402.0
Median :427.0 Median :459.0
Mean :417.3 Mean :456.1
3rd Qu.:490.0 3rd Qu.:530.0
Max. :796.0 Max. :961.0
Analyze
Aggregate your data so it's useful and accessible:
Given permission to specific users to access this data.
Organize and format the data:
The data was organized in Google Sheets. Format was conducted in R Program.
Perform Calculations:
Necessary calculations was conducted in R Program.
Identify trends and relationships:
Data shows that the "fairly" and "active" users, burn the most calories. This wasn't a big surprise, rather than a confirmation. Most of the calories burned were 6K > and 10K <. >10K steps in the mean distance category and the <6K steps in the low category. Speed plays an important part in burning more calories. There is a relation between the activity level and sleep quality. The sedentary users have the largest percentage of bad sleepers where as some activity, shows a great increase of normal sleep. There was also a decrease of over sleepers (more than 8H) in the most active categories.
Grouping Users Into Four Categories:
> data_by_usertype <- daily_data %>%
+
+ user_type = factor(case_when(
+
+ data_by_usertype <- daily_data %>%
+ summarise(
+
+ user_type = factor(case_when(
+ SedentaryMinutes > mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Sedentary",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes > mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Lightly Active",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes > mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Fairly Active",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes > mean(VeryActiveMinutes) ~ "Very Active",
+
+ ),levels=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active")), Calories, .group=Id) %>%
+
+ summarise(
+ user_type = factor(case_when(
+ SedentaryMinutes > mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Sedentary",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes > mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Lightly Active",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes > mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Fairly Active",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes > mean(VeryActiveMinutes) ~ "Very Active",
+ ),levels=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active")), Calories, .group=Id) %>%
+ drop_na()
User Type Distribution and Calories Burned For Every User Type:
> ggplot(data_by_usertype, aes(user_type, Calories, fill=user_type)) +
+ geom_boxplot() +
+ theme(legend.position="none") +
+ labs(title="Calories burned by User type", x=NULL) +
+ theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))
Plot - Distance/Steps and Calories Burned:
> ggplot(data_by_usertype, aes(user_type, Calories, fill=user_type)) +
+ geom_boxplot() +
+ theme(legend.position="none") +
+ labs(title="Calories burned by User type", x=NULL) +
+ theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))
> daily_data %>%
+ summarise(
+ distance = factor(case_when(
+ TotalDistance < 4.5 ~ "< 4.5 mi",
+ TotalDistance >= 4.5 & TotalDistance <= 7 ~ "4.5 > & < 7 mi",
+ TotalDistance > 7 ~ "> 7 mi",
+ ),levels = c("> 7 mi","4.5 > & < 7 mi","< 4.5 mi")),
+ steps = factor(case_when(
+ TotalSteps < 6000 ~ "< 6k steps",
+ TotalSteps >= 6000 & TotalSteps <= 10000 ~ "6k > & < 10k Steps",
+ TotalSteps > 10000 ~ "> 10k Steps",
+ ),levels = c("> 10k Steps","6k > & < 10k Steps","< 6k steps")),
+ Calories) %>%
+ ggplot(aes(steps,Calories,fill=steps)) +
+ geom_boxplot() +
+ facet_wrap(~distance)+
+ labs(title="Calories burned by Steps and Distance",x=NULL) +
+ theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))
Sleep Quality Categories for Sleeping Time and New Table for Sleep Categories Percentage for Individual Users:
> sleepType_by_userType <- daily_data %>%
+ group_by(Id) %>%
+ summarise(
+ user_type = factor(case_when(
+ SedentaryMinutes > mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Sedentary",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes > mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Lightly Active",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes > mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Fairly Active",
+ SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes > mean(VeryActiveMinutes) ~ "Very Active",
+ ),levels=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active")),
+ sleep_type = factor(case_when(
+ mean(TotalMinutesAsleep) < 360 ~ "Bad Sleep",
+ mean(TotalMinutesAsleep) > 360 & mean(TotalMinutesAsleep) <= 480 ~ "Normal Sleep",
+ mean(TotalMinutesAsleep) > 480 ~ "Over Sleep",
+ ),levels=c("Bad Sleep", "Normal Sleep", "Over Sleep")), total_sleep = sum(TotalMinutesAsleep) ,.groups="drop"
+ ) %>%
+ drop_na() %>%
+ group_by(user_type) %>%
+ summarise(bad_sleepers = sum(sleep_type == "Bad Sleep"), normal_sleepers = sum(sleep_type == "Normal Sleep"),over_sleepers = sum(sleep_type == "Over Sleep"),total=n(),.groups="drop") %>%
+ group_by(user_type) %>%
+ summarise(
+ bad_sleepers = bad_sleepers / total,
+ normal_sleepers = normal_sleepers / total,
+ over_sleepers = over_sleepers / total,
+ .groups="drop"
+ )
Plotting Data For Each User Type:
> sleepType_by_userType_melted<- melt(sleepType_by_userType, id.vars = "user_type")
Share
Determine the best way to share your findings:
The best way to share my findings, will be through a visualization that my entire audience understand.
Create effective data visualization:
The data will be a plot chart. Different colors were used to focus their attention on the most important data.
Present your findings:
A schedule was setup through Google Calendar to present the data findings. Once everyone approved the schedule presentation, it was then presented to a select audience. The audience include the main stakeholders Urška Sršen, Bellabeat’s co-founder and Chief Creative Officer; Sando Mur, Mathematician and Bellabeat’s cofounder; and Bellabeat marketing analytics team.
Ensure your work is accessible:
The work has been made accessible to my manager and co-workers. If they should need access, links were provided. If I should be out, someone access to this information.
Act
Final conclusion based on your analysis:
There is a relation between higher intensity activity and calories burned. Users need motivation, so logging activity through their device could be a helpful. There is a clear trend of better sleep linked to the activity level that would improve your sleep and overall health.
How could your team and business apply your insights?:
With these findings, I would have marketing focus on informing the user and new potential users through notifications or emails regarding their sleep and how many calories they will burn, when they are most rested. Including some of the facts obtained, that information would be provided to users for acknowledgment. This let the user see that this company took out time to find the best products for them, based on specific usage.
What next steps would you or your stakeholders take based on your findings?:
The stakeholders should now work on new marketing strategies using the findings that were shared with them. They would need to look at maybe doing some upgrades to their products, since they know now what users utilize the most. New products specific to these behaviors may even be a good idea.