Introduction
This is my Google data analytic Capstone project. In this case study I analyzed a public dataset for a fictional company.
Scenario
In this capstone I am a junior data analyst. The company Cyclistic, a bike share company in Chicago. Me and my team are to analyze all the datasets and figure out the difference between casual and member(subscribed) riders. With this analysis my team is to come up with a strategy to convert casual riders into member riders. A casual rider is someone who only purchases a day pass or a single ride pass. A member rider is someone who has an annual pass.
Ask
My questions that i asked my self while completing this capstone are :
How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members ?
Prepare
The data that I used for the analysis was a set of datasets provided to me by the Google Data Analyst Capstone ( link : https://divvy-tripdata.s3.amazonaws.com/index.html, The data has been provided by Motive Internal Inc. under this link https://www.divvybikes.com/data-license-agreement). This makes my data reliable since it comes from a reliable source.
Snippets of code used for analysis in R
combining datasets
inspecting data
cleaning and filtering
creating new data frames
Writing csv files for Tableau (create visualizations)
#Install necessary packages and upload libraries
install.packages("tidyverse")
install.packages("lubridate")
install.packages("ggplot2")
library("tidyverse")
library("lubridate")
library("ggplot2")
# Upload Trip datasets 2021
jan_2021 <- read.csv("202101-divvy-tripdata.csv")
feb_2021 <- read.csv("202102-divvy-tripdata.csv")
mar_2021 <- read.csv("202103-divvy-tripdata.csv")
apr_2021 <- read.csv("202104-divvy-tripdata.csv")
may_2021 <- read.csv("202105-divvy-tripdata.csv")
Jun_2021 <- read.csv("202106-divvy-tripdata.csv")
jul_2021 <- read.csv("202107-divvy-tripdata.csv")
aug_2021 <- read.csv("202108-divvy-tripdata.csv")
sep_2021 <- read.csv("202109-divvy-tripdata.csv")
oct_2021 <- read.csv("202110-divvy-tripdata.csv")
nov_2021 <- read.csv("202111-divvy-tripdata.csv")
dec_2021 <- read.csv("202112-divvy-tripdata.csv")
# Upload Trip dataset 2022
jan_2022 <- read.csv("202201-divvy-tripdata.csv")
feb_2022 <- read.csv("202202-divvy-tripdata.csv")
mar_2022 <- read.csv("202203-divvy-tripdata.csv")
apr_2022 <- read.csv("202204-divvy-tripdata.csv")
may_2022 <- read.csv("202205-divvy-tripdata.csv")
jun_2022 <- read.csv("202206-divvy-tripdata.csv")
jul_2022 <- read.csv("202207-divvy-tripdata.csv")
#Inspect data frame for incongruencies
#2021 months
str(jan_2021)
str(feb_2021)
str(mar_2021)
str(apr_2021)
str(may_2021)
str(Jun_2021)
str(jul_2021)
str(aug_2021)
str(sep_2021)
str(oct_2021)
str(nov_2021)
str(dec_2021)
#2022 months
str(jan_2022)
str(feb_2022)
str(mar_2022)
str(apr_2022)
str(may_2022)
str(jun_2022)
str(jul_2022)
# Stack individual months data frames into one big data frame
# trips 2021
trips_data <- bind_rows(jan_2021,feb_2021,mar_2021,apr_2021
,may_2021,Jun_2021,jul_2021,aug_2021
,sep_2021,oct_2021,nov_2021,dec_2021)
# trips 2022
trips_v2 <- bind_rows(jan_2022,feb_2022,mar_2022,apr_2022
,may_2022,jun_2022,jul_2022)
summary(trips_data)
summary(trips_v2)
# Adding ride length in seconds for 2021
trips_data$ride_length <- difftime(trips_data$ended_at, trips_data$started_at)
# Inspect the structure of columns for 2021
str(trips_data)
# checking if ride_length is a numeric value
is.numeric(trips_data$ride_length)
trips_data$ride_length <- as.numeric(as.character(trips_data$ride_length))
is.numeric(trips_data$ride_length)
#dataset for 2022
#inspecting table for 2022
colnames(trips_v2)
nrow(trips_v2)
dim(trips_v2)
head(trips_v2)
str(trips_v2)
summary(trips_v2)
unique(trips_v2$member_casual)
#Adding data separating date for 2022
trips_v2$date <- as.Date(trips_v2$started_at)
trips_v2$month <- format(as.Date(trips_v2$date), "%m")
trips_v2$day <- format(as.Date(trips_v2$date), "%d")
trips_v2$year <- format(as.Date(trips_v2$date), "%Y")
trips_v2$day_of_the_week <- format(as.Date(trips_v2$date), "%A")
#Adding ride length in seconds for 2022
trips_v2$ride_length <- difftime(trips_v2$ended_at, trips_v2$started_at)
#inspecting
str(trips_v2)
trips_v2$ride_length <- as.numeric(as.character(trips_v2$ride_length))
is.numeric(trips_v2$ride_length)
#Creating a new data
trips_data_v2 <- trips_data[!(trips_data$ride_length == 0 | trips_data$ride_length < 0),] #removing all rows with non
trips_v2_v2 <- trips_v2[!(trips_v2$ride_length == 0 | trips_v2$ride_length < 0),]
# create new data frame based on number of rides and average ride lengths for 2021
ride_trips <- trips_data_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday)
ride_trips_month <- trips_data_v2 %>%
group_by(member_casual, month) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, month)
# Writing a file in csv format for 2021
write.csv(ride_trips
, file = "C:/Users/bravo/OneDrive/Desktop/kevinssssssssssss/Capstone/2021_avg_ride_trips_trips.csv")
write.csv(ride_trips_month
, file = "C:/Users/bravo/OneDrive/Desktop/kevinssssssssssss/Capstone/2021_avg_month_ride_trips_trips.csv")
# create new data frame based on number of rides and average ride lengths for 2022
ride_trip_2022 <- trips_v2_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>% #groups by usertype and weekday
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday)
ride_trip_2022_month <- trips_v2_v2 %>%
group_by(member_casual, month) %>%
,average_duration = mean(ride_length)) %>%
arrange(member_casual, month)
# Writing a file in csv format for 2022
write.csv(ride_trip_2022
, file = "C:/Users/bravo/OneDrive/Desktop/kevinssssssssssss/Capstone/2022_avg_ride_trips_trips.csv")
write.csv(ride_trip_2022_month
, file = "C:/Users/bravo/OneDrive/Desktop/kevinssssssssssss/Capstone/2022_avg_month_ride_trips_trips.csv")
Share
I used Tableau and imported and used the data frames in csv format in order to make bar graphs that represented my finding that where filtered and cleaned in R.
Number of rides based on days of the week
Number of rides based on months of the year
Findings
The analysis demonstrates the difference between casual riders and annual members. In the data sets provided of a pattern from 2021 through 2022. Since the records are from January to July from 2022, we are able to compare and analyze. Records show that most bike rentals spike during summer. However, the increase of casual riders spikes significantly more than annual members. Casual riders surpass annual members starting June and decrease around September. Another finding is that on average casual riders rent more bikes during the weekends when comparing days of the week.
Suggestions
Based on the findings of the analysis there are a few suggestion to help turn casual riders into annual members.
Offer discounts to casual riders for upgrading from any kind of pass to an annual pass all summer long. Start announcing the discount at the beginning of May.
Start a new kind of pass, either summer pass or a half year pass.
Increase ads on fitness apps and music streaming apps. Reach out to influencers who are pro-earth and pro-health (especially during summer and weekends).