This is a case study for Bellabeat - a high-tech manufacturer of health-focused products for women. The goal of this project is to provide an analysis of how consumers are using similar smart device products from other companies to offer insights and help stakeholders make data driven decisions to help market their own products. For this project, I will be using R programming with R Studio for my data cleaning and analysis.
About the company:
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.
By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.
Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.
Products:
Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
The Data:
The data for this project comes from Kaggle and can be found here - https://www.kaggle.com/datasets/arashnic/fitbit
The following is the description of the dataset from Kaggle:
"This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences."
Loading packages and libraries:
library(readxl)
library(tidyverse)
library(ggplot2)
library(scales)
library(plyr)
Loading Excel files:
dailyActivity <- read_excel("C:/dailyActivity_merged.xlsx")
weightLog <- read_excel("C:/weightLogInfo_merged.xlsx")
sleepDay <- read_excel("C:/sleepDay_merged.xlsx")
hourlySteps <- read_excel("C:/hourlySteps_merged.xlsx")
hourlyCalories <- read_excel("C:/hourlyCalories_merged.xlsx")
hourlyIntensities <- read_excel("C:/hourlyIntensities_merged.xlsx")
heartRate <- read_excel("C:/heartrate_seconds_merged.xlsx")
Data cleaning and transformation:
Thankfully, the dataset was already pretty clean going into my analysis. I was able to preview the data in Excel before loading it into R Studio, and I did some data cleaning there such as removing duplicate data and looking for null values that would throw off the analysis. There are some other data cleaning steps I took with some of the data, which you will see below. One big piece of this was finding that the date value in most of the data was in a "DD/MM/YYYY HH:MM:SS" format, which I had to clean in order to create some of the visualizations, since some of them focus on just the date or the time, but using both together gave irrelevant or useless information in the analysis, so I had to split the column into two different columns.
First, lets look at some of the data and get an idea of what we're working with using the head() and colnames() functions:
Sample of code:
head(dailyActivity)
colnames(dailyActivity)
Output:
Weightlog:
SleepDay:
HouylSteps:
HourlyCalories:
HourlyIntensities:
HeartRate:
Now that we have a better idea of what the data looks like, there are a few more steps for cleaning that I need to do before the data is ready for analysis.
First, I will be working with the Daily Activity table. I want to extract the day of the week from the ActivityDate column.
I also want to remove some of the redundant data from this table, since there are several columns I know I won't be using for analysis, and it makes it easier to work with to simply remove them now, so I will be dropping these columns.
Here is the code I ran, and the output for the glimpse() and colnames functions:
#cleaning the dailyActivity spreadsheet. Fixed the format for the date and removed redundant columns
daily <- dailyActivity %>%
mutate(DayWeek = weekdays(ActivityDate)) %>% #turning the date into a day of the week for later analysis
select(-c(5:10)) #dropping redundant columns
glimpse(daily)
colnames(daily)
Next, I want to do something similar with the Weight Log table, and drop columns that are redundant, such as listing the weight in both Kg and pounds. I also want to modify the date in this table and make it easier for future analysis.
weightLog <- weightLog %>%
mutate(Date = as.Date(Date, format = "%m/%d/%Y %H:%M")) %>%
select(-c(3, 7, 8)) #dropping columns we don't need, isManualReport, weight in kg, and logID
head(weightLog)
Next, I want to join all of the hourly activity data onto one table, called hourlyActivity, that way it can mirror the daily Activity table.
hourlyActivity <- hourlyCalories %>%
left_join(hourlyIntensities, by = c("Id", "ActivityHour")) %>%
left_join(hourlySteps, by = c("Id", "ActivityHour"))
head(hourlyActivity)
Lastly, I want to fix the date column in the hourlySteps table.
The first thing I want to do before getting into some visualizations is summarize the data to get a better look at it, and see what we're looking at. To do this, I'll be using the summary() function for the Daily Activity, Weight Log, Sleep, and Hourly Activity tables.
Now I want to get into solving the business task of finding insights that the stakeholders can use to better understand their target audience and assist them with data driven decision making. The first thing I want to look at is how the users in this sample Fitbit dataset are actually using the different aspects of the product offered to them.
Is each user entering in their weight and tracking that information as well as using the Fitbit to monitor their workout progress, or steps taken? Is each user using things like the heart rate monitor or tracking their sleep with the product?
To find this information, I will count the distinct ID's in each table to see the ways the users in this Fitbit dataset use the product:
unique(daily$Id)
unique(hourlyActivity$Id)
unique(weightLog$Id)
unique(sleepDay$Id)
unique(heartRate$Id)
But I also want to apply a visualization here to make this a little easier to see and understand:
You can clearly see the majority of users are interested in tracking their physical activity, but not all users in this dataset were tracking their sleep. Even more than that, there were fewer users who also wanted to enter their weight into the app, and users who used it to track their heart rate.
Next, I wanted to create a visualization that shows the relationship between Steps Taken and Calories burnt:
ggplot(daily, aes(x=TotalSteps, y = Calories, color=SedentaryMinutes)) +
geom_point() +
geom_smooth(method = "lm") +
xlab("Steps Taken") +
ylab ("Calories Burnt") +
ggtitle("Steps Taken vs. Calories Burnt")+
scale_color_gradient(low="red", high="green")
This information is useful to have because it shows the thing most of the Fitbit users are most interested in (burning calories) and how the amount of Steps they take can positively impact this, Because of this, the stakeholders can use this information to help their users get what they want out of the product, such as by giving reminders to their users that they should take more steps in the day, or maybe giving them notifications as the complete different milestones in their daily goals, like congratulating them that they are half way to achieving their step goals for the day.
Next, I wanted to focus on finding other information on the Steps taken for the users in this sample dataset, since we saw earlier that this sort of daily and hourly activity is what the majority of the users are utilizing in the product. Specifically, I want to find out what time of day most of the steps are recorded, and what day of the week is most popular for steps recorded as well.
During my analysis, I found how the users in the sample dataset use the Fitbit products. Based of this information my first recommendation would be to focus resources on the activity and fitness tracking aspect of the product, more so than the heart rate monitor/sleep tracker/weight information. It was clear that most users didn't use these features, so it's also clear that the marketing strategy should focus on the fitness and step tracking aspect of the product as well.
There was a clear correlation between steps taken and calories burnt, and we were also able to find the most popular times where people take steps for both the time of the day as well as day of the week. All of this information can be combined to improve the product. My recommendation would be to add reminders, notifications, and find ways to help users reach their goal throughout the week for all of this information. We know the activity tracking and step count is the most popular aspect of the product, so implementing something like this will also go hand in hand with the marketing strategy. Having notifications setup for Sunday, Monday, and Friday that remind users to try and get more steps in to reach their goal would be a good idea, since the analysis shows these days are when people often slack and take fewer steps. Or having a notification go off around the middle of the day and early evening would alert people during a time when most people are able to take the majority of their steps. These kind of tools could then be used in the marketing for the Bellabeat products, and show that Bellabeat cares about helping people reach their fitness goals.
Use the Bellabeat Spring product in the marketing strategy. The Spring product tracks water intake, which is something that wasn't represented in the sample dataset at all. Because of this, I would say the Spring product is pretty unique, which can also help Bellabeat brand since it shows they are offering something some of their competitors don't have.
Offer trial periods for the Bellabeat app. The app is going to be a subscription based product, but my recommendation from this analysis would be to include a promotion every so often that would allow people to try out the Bellabeat app. You can see that people who tracked their steps and other activity continued to use the product throughout the entire month. I believe this shows that people who use fitness tracking products generally stick around and don't swap back and forth between products. If a trial period is offered, some people could start to use the app for their fitness tracking and want to invest in the full paid subscription after their trial ends, and would be more likely to buy other Bellabeat products as well.
This case study and data set were pretty fun to work with. I liked being able to create different visualizations based on fitness data, since tracking workouts and fitness data is something I personally do every week to help me reach my own fitness goals. If I were to spend more time on this case study, I would love to find more datasets that could help me solve the task of giving recommendations that would help drive Bellabeats marketing strategies. Finding different information, such as user surveys where they were asked different information such as what products they used the most, or found the most useful, would be interesting to apply to this case study. The sample dataset could be tough to work with sometimes because it wasn't always great at answering some of the questions I was coming up with throughout the project, like knowing why some users decided not to enter their weight into the Fitbit app, or why some people didn't want to use the heart rate monitor. I realized it can be hard to give recommendations based on the data I had, since analyzing some of the data would lead to questions that the data simply couldn't answer. Overall, I enjoyed working on this case study, and I was able to learn a lot more about R programming in the process!