The second Capstone Project to complete my Data Analysis Certification from Google was undertaken for a fitness app and device manufacturing company called Bellabeat.
ASK
In the Ask phase of the project I presented questions regarding various aspects of the project.
1) Who are the stakeholders?
Chief Creative Officer, Member of the Executive Team and Marketing Analytics Team.
2)What is this project trying to find out?
1. What are some trends in smart device usage?
2. How could these trends apply to Bellabeat customers?
3. How could these trends help influence Bellabeat's marketing strategy?
PREPARE
In the prepare stage of the project I searched various available databases and found ● Fitbit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty Fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. This data is reliable because it is obtained from Kaggle and is from original sources- the Fitbit users themselves. It is comprehensive as it contains columns like Total Steps taken, sedantry minutes, heart rate, number of minutes asleep etc. It is checked for privacy and does not contain identifiable names or addresses of the Fitbit users.
PROCESS
Checked through the csv files found that the data is available in both long and wide format. Decided to use the following wide format forms for my analysis .
1) Daily activitiy merged.
2)Sleep/Day merged.
3)Weight Log.
4)Heartrate/Sec.
Since the data was large decided to work in RStudio Desktop. Below is the script used for cleaning and analysis.
install.packages("tidyverse")
install.packages("lubridate")
install.packages("ggplot2")
library(tidyverse)
library(lubridate)
library(ggplot2)
dailyActivity<-read.csv("dailyActivity_merged.csv")
heartrate_seconds<-read.csv("heartrate_seconds_merged.csv")
minuteMET<-read.csv("minuteMETsNarrow_merged.csv")
sleepDay<-read.csv("sleepDay_merged.csv")
weightLogInfo<-read.csv("weightLogInfo_merged.csv")
head(dailyActivity)
colnames(dailyActivity)
head(sleepDay)
colnames(sleepDay)
n_distinct(dailyActivity$Id)
n_distinct(sleepDay$Id)
nrow(dailyActivity)
nrow(sleepDay)
dailyActivity%>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes)%>%
summary()
sleepDay%>%
select(TotalSleepRecords,
TotalMinutesAsleep,
TotalTimeInBed)%>%
summary()
ggplot(data=dailyActivity,aes(x=TotalSteps,y=SedentaryMinutes))+geom_point()
ggplot(data=sleepDay,aes(x=TotalMinutesAsleep,y=TotalTimeInBed))+geom_point()
combine_data<-merge(sleepDay,dailyActivity, by="Id")
n_distinct(combine_data$Id)
install.packages("dplyr")
library(dplyr)
full_join(sleepDay,dailyActivity, by="Id")
combine_data2<-full_join(sleepDay, dailyActivity, by="Id")
n_distinct(combine_data2$Id)
head(combine_data2)
ggplot(data=combine_data2, aes(x=TotalMinutesAsleep,y=TotalSteps))+geom_point()
SHARE
Data was visualized in Tableau Public the link for which is below.
https://public.tableau.com/app/profile/shahtaj.hyder.khan/viz/WorkoutCapstone/Sheet1
https://public.tableau.com/app/profile/shahtaj.hyder.khan/viz/SleepDay_16619761800880/Sheet1
https://public.tableau.com/app/profile/shahtaj.hyder.khan/viz/StepsandSleep_16619776010340/Sheet1
Found that people who tracked too many steps per day also tired out and had a lot of sedantry minutes. On the other hand people who did not track many steps at all had a lot of sedantry minutes. People who tracked an average number of 4-12K steps a day were pretty active for most of their waking hours.
Tracking time in bed and the time actually asleep was also really helpful for people to know how many hours of sleep they actually need.
Tracking the steps taken on a daily basis and the hours asleep shows a trend that people who average around 10,000 steps daily get a good 8 hours of sleep.
ACT
Would recommend the stakeholders allow further analysis to see if Bellabeat customers have the features on their devices to measure similar things like steps taken, hours slept and sedantry minutes. If not would recommend that those features be added so marketing strategists could use that as a sale point. If those features exist on Bellabeat devices would be beneficial to highlight them as a marketing strategy. Also would recommend further analsyis of heart rate measurements and BodyMass Index measurements by Bellabeat users to see if that could be used as a marketing strategy.