In this analysis, we need to study the non-Bellabeat dataset, cleanse the data, validate and check for reliability, analyze the data, understand the trends, visualize the patterns, draw meaningful insights to support Bellabeat's vision.
Urška Sršen, Co-founder of Bellabeat asked the Analysts team to slice, dice & drill the non-bellabeat dataset smart devices. These trends to be applied on one of the Bellabeat's smart device to understand the user behavior and draw insights that is helpful for Marketing strategy.
Using FitBit Fitness Tracker data that is extracted from public domain on Kaggle platform. It is made available through Mobius with the consent taken from Thirty FitBit users.
In this stage, we need to process the data for analysis using Excel as a tool for checking errors, data integrity, cleaning the data, documenting the cleaning process.
Using tools, we need to perform various calculations, aggregation of data, identify trends and relationships.
Using Tableau/in-built functions of Excel, need to visualize the trends and prepare a presentation to shocase our findings to the Key Stakeholders.
To provide our conclusions, recommendations in order for the marketing team to take proper action. Check for any additional data to expand the analysis.
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.
By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.
Analyze customers use of an existing competitor to identify potential opportunities for growth and recommendations for the Bellabeat's marketing strategy.
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
Urška Sršen - Bellabeat’s cofounder and Chief Creative Officer
Sando Mur — Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
The data for this analysis will come from FitBit Fitness Tracker Data on Kaggle. These 18 datasets were generated by respondents between 05.01.2016–05.12.2016. Thirty Fitbit users consented for submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.
Limitations for this data exist due to the sample size and absence of key characteristics of the participants, such as gender, age, location, lifestyle.
For this analysis the datasets for daily activity, daily calories, daily intensities, daily steps, heartrate by seconds, minute METs, daily sleep, and weight log information, will be used and downloaded in the local machine.
R Studio is used to complete this analysis because of ease of accessing the data along with drawing visualizations. Data is organized into the Long Format, hence no issues with the visualizations.
R- Data is Reliable as it is downloaded from Kaggle website and has proper public domain license (https://creativecommons.org/publicdomain/zero/1.0/)
O- Data is Original (Collected from https://zenodo.org/record/53894#.X9oeh3Uzaao)
C- Comprehensive (Although it has few limitations due to the sample size and absence of key characteristics of the participants, such as gender, age, location, lifestyle) but the data present is sufficient to proceed with the analysis.
C- (Current) The data available is within the span of 5 years, hence it is feasible to use this for analysis.
C- Cited- The data is cited and used from (https://creativecommons.org/publicdomain/zero/1.0/)
Viewing data frames to check for any errors/compromise in data types (using Rstudio):
To ensure data frame were imported correctly. This can be achieved using glimpse(), summary(), head(), colnames(), View() functions.
Let's look at "daily_activity" data frame using functions mentioned above:
In the above head(daily_activity), you can see Date is not in proper DATE format and is recognized as STRING Character format. Hence, we need to change it into proper Date format as below: (To change we need to use lubridate package and as.Date() function)
Pro trip: If there are multiple date formats and no. of rows are less; then we can change the Data types using Power Query in Excel as well.
In the above head(daily_steps), you can see Date is not in proper DATE format and is recognized as STRING Character format. Hence, we need to change it into proper Date format as below: (To change we need to use lubridate package and as.Date() function)
Pro trip: If there are multiple date formats and no. of rows are less; then we can change the Data types using Power Query in Excel as well.
After changing into Date format, we can see DATE Datatype mentioned below Date attribute.
In the above head(heart_rate_sec), you can see DateTime is not in proper DATETME format and is recognized as STRING Character format. Hence, we need to change it into proper Date format as below:(To change we need to use lubridate package and as.POSIXct() function)
Pro trip: If there are multiple date formats and no. of rows are less; then we can change the Data types using Power Query in Excel as well.
After changing into Date format, we can see DATETIME(POSIXct) Datatype mentioned below Date attribute.
In the above head(minute_METs), you can see DateTime is not in proper DATETME format and is recognized as STRING Character format. Hence, we need to change it into proper Date format as below:(To change we need to use lubridate package and as.POSIXct() function)
Pro trip: If there are multiple date formats and no. of rows are less; then we can change the Data types using Power Query in Excel as well.
After changing into Date format, we can see DATETIME(POSIXct) Datatype mentioned below Date attribute.
In the above head(sleep_day), you can see DateTime is not in proper DATETME format and is recognized as STRING Character format. Hence, we need to change it into proper Date format as below:(To change we need to use lubridate package and as.POSIXct() function)
Pro trip: If there are multiple date formats and no. of rows are less; then we can change the Data types using Power Query in Excel as well.
Before, we change the DateTime format, let's rename the TIMESTAMP- SleepDay column name as "Sleep_DateTime" Column name can be changed using colnames() function as
After changing into Date format, we can see DATETIME(POSIXct) Datatype mentioned below Date attribute.
In the above head(weight_log), you can see DateTime is not in proper DATETME format and is recognized as STRING Character format. Hence, we need to change it into proper Date format as below:(To change we need to use lubridate package and as.POSIXct() function)
Pro trip: If there are multiple date formats and no. of rows are less; then we can change the Data types using Power Query in Excel as well.
All eight of the data frames contain the “Id” column, so it is possible to merge all of them if needed. The daily_activity data frame appears to contain data for calories, intensities, and steps. In order to use the daily_activity frame in place of daily_calories, daily_intensities, and daily_steps, the number of observations must be the same and the observations must match for each ID number.
Let's check for duplicates: number of distinct rows to be same in all the four datasets:
Summary shows that the average user takes 7638 steps a day which is less than CDC recommendation of 9000 - 10000 steps per day. On an average, users spend 21.16 minutes of very active mode of activity which is 148.12 minutes a week. CDC recommends 150 minutes of active aerobic activity per week. Users activity is still less than the CDC recommendation. Lets consider the sedentary minutes in the data frame of 16.52 hours a day. It means users have "16.52 hours" of inactive/less activity in a day. This leads to other health issues because of inactive body movements. Research suggests that 40 minutes of moderate to vigorous activity a day will balance out effects of sedentary activity of 10 hours a day.
Further, users spend 2300 calories per day. This is more than average value of 1900 calories suggested.
MET is : ration of working metabolic rate relative to your resting metabolic rate. Users have average MET of 14.69. One MET is the energy used when at rest. This means an activity with a MET of four, would require a person to exert four times the energy theye do when they are sitting. Therefore average value of 14.69 is high value throughout a day, that leads to an assumption that the FitBit is not calculating this data point correctly. Due to this, the minute MET data frame will not be used further in this analysis.
Using ggplot() function in Rstudio to create visuals that depict patterns and trends.
Important Points:
1. If echo=FALSE mentioned in {r} then, after rendering the document i.e,. after knitting the rmd file, code is not shown, only the output is shown. Hence, for this analysis purpose we are excluding it.
2. If se=FALSE not mentioned in geom_smooth() function; then confidence bands across the line is shown. Or else, it shows a smooth lime without confidence bands.
3. span = 0.1 is used to show to visualize the line varying nature. If we increase the span =1 then the line will be a smooth line. span range will be from 0.1 to 1
4. size, shape, color properties can be used in geom() functions to customize your graphs as per the requirement.
5. In geom_smooth(), line is obtained through "loess" type which is to be used if the points are less than 1000. Else, we need to use "gam" method.
6. In order to give title, subtitle,caption to the graphs; we need to use labs() functions with the keywords such as title= " ", subtitle = " ", caption =" "(Caption is specified at the bottom right corner of the viz.)
The above Viz confirms that the data collected by FitBit tracker is intact.
Now, lets visualize the Daily steps vs Calories burned: x & y axes titles have been changed.
The above viz displays positive co-relation between daily calories burned and daily steps taken. This depicts users are users need to take more steps.
This means, if more time spent for physical activity more calories they burned.
Since its inception, Bellabeat has been successful in empowering women by providing data on their activity, sleep, stress, hydration levels, and reproductive health. Based on analyzing how Fitbit consumers use and respond to features, recommendations can be made to promote further growth for Bellabeat.
The Bellabeat app should be completely transformed and revamped. Rather than simply providing data on user’s health, the app should furthermore encourage users to meet fitness goals and become a friendly platform.
The CDC recommends working out with a friend in order to feel more motivated, be more adventurous in trying workouts, and be consistent. CDC even recommends the use of a workout app to connect with friends and reach your goals. The Bellabeat app could become that workout app that women turn to, by creating a sisterhood of supportive women ready to prioritize their health.
1. Enable social networking so users can post their favorite workouts, wellness tips, healthy meals, etc.
2. Enable users to add friends and view each other’s activity.
3. Create weekly fitness and wellness challenges to encourage use.
4. Have health and fitness companies pay for advertising.
5. Recommend users to get 10,000 steps a day and enable alert notifications to encourage users to meet goal.
6. Recommend users to get at least 7 hours of sleep a night and enable alert notifications to encourage users to meet this.
7. Recommend users get 150 minutes of vigorous activity a week and enable alert notifications to entourage users to meet this.
8. Encourage users to provide weight and height to track BMI.
9. If users are interested in losing weight, enable notifications to keep users on track to burn necessary calories to meet goal.
10. Enable alert notifications if user’s heart rate varies significantly from their normal.
11. Enable notifications to encourage activity if a user has spent 45 minutes in bed awake.
12. Enable notifications to encourage activity if a user has been sedentary for an extended period of time.
1. Offer 30-day free trial subscription.
2. Offer reduced subscription fee when a member refers a friend.
3. Offer discounts for Bellabeat smart device products with membership.
4. Partner with health & fitness companies and offer discounts for members.
1. Heavily market Spring as Fitbit does not track hydration levels.
2. Offer a bundle deal for the Spring and Leaf together
“3 Reasons to Work out with a Friend.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 23 Apr. 2021, www.cdc.gov/diabetes/library/spotlights/workout-buddy.html
“About Adult Bmi.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 17 Sept. 2020, www.cdc.gov/healthyweight/assessing/bmi/adult_bmi/index.html
“The Dangers of Sitting: Why Sitting Is the New Smoking.” The Dangers of Sitting: Why Sitting Is the New Smoking — Better Health Channel, 22 Aug. 2020, www.betterhealth.vic.gov.au/health/healthyliving/the-dangers-of-sitting