Google Data Analytics Capstone:

Casual vs Member Cyclist

Introduction

This is my Google data analytic Capstone project. In this case study I analyzed a public dataset for a fictional company.

Scenario

In this capstone I am a junior data analyst. The company Cyclistic, a bike share company in Chicago. Me and my team are to analyze all the datasets and figure out the difference between casual and member(subscribed) riders. With this analysis my team is to come up with a strategy to convert casual riders into member riders. A casual rider is someone who only purchases a day pass or a single ride pass. A member rider is someone who has an annual pass.

Ask

My questions that i asked my self while completing this capstone are :

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members ?

Prepare

The data that I used for the analysis was a set of datasets provided to me by the Google Data Analyst Capstone ( link : https://divvy-tripdata.s3.amazonaws.com/index.html, The data has been provided by Motive Internal Inc. under this link https://www.divvybikes.com/data-license-agreement). This makes my data reliable since it comes from a reliable source.

Snippets of code used for analysis in R

combining datasets
inspecting data
cleaning and filtering
creating new data frames
Writing csv files for Tableau (create visualizations)

Packages and Libraries needed

#Install necessary packages and upload libraries

install.packages("tidyverse")

install.packages("lubridate")

install.packages("ggplot2")

library("tidyverse")

library("lubridate")

library("ggplot2")

Reading and storing csv files (spreadsheets)

# Upload Trip datasets 2021

jan_2021 <- read.csv("202101-divvy-tripdata.csv")

feb_2021 <- read.csv("202102-divvy-tripdata.csv")

mar_2021 <- read.csv("202103-divvy-tripdata.csv")

apr_2021 <- read.csv("202104-divvy-tripdata.csv")

may_2021 <- read.csv("202105-divvy-tripdata.csv")

Jun_2021 <- read.csv("202106-divvy-tripdata.csv")

jul_2021 <- read.csv("202107-divvy-tripdata.csv")

aug_2021 <- read.csv("202108-divvy-tripdata.csv")

sep_2021 <- read.csv("202109-divvy-tripdata.csv")

oct_2021 <- read.csv("202110-divvy-tripdata.csv")

nov_2021 <- read.csv("202111-divvy-tripdata.csv")

dec_2021 <- read.csv("202112-divvy-tripdata.csv")

# Upload Trip dataset 2022

jan_2022 <- read.csv("202201-divvy-tripdata.csv")

feb_2022 <- read.csv("202202-divvy-tripdata.csv")

mar_2022 <- read.csv("202203-divvy-tripdata.csv")

apr_2022 <- read.csv("202204-divvy-tripdata.csv")

may_2022 <- read.csv("202205-divvy-tripdata.csv")

jun_2022 <- read.csv("202206-divvy-tripdata.csv")

jul_2022 <- read.csv("202207-divvy-tripdata.csv")

Inspecting data frame for incongruencies

#Inspect data frame for incongruencies

#2021 months

str(jan_2021)

str(feb_2021)

str(mar_2021)

str(apr_2021)

str(may_2021)

str(Jun_2021)

str(jul_2021)

str(aug_2021)

str(sep_2021)

str(oct_2021)

str(nov_2021)

str(dec_2021)

#2022 months

str(jan_2022)

str(feb_2022)

str(mar_2022)

str(apr_2022)

str(may_2022)

str(jun_2022)

str(jul_2022)

Stack individual months data frames into one big data frame and inspecting it

# Stack individual months data frames into one big data frame

# trips 2021

trips_data <- bind_rows(jan_2021,feb_2021,mar_2021,apr_2021

,may_2021,Jun_2021,jul_2021,aug_2021

,sep_2021,oct_2021,nov_2021,dec_2021)

# trips 2022

trips_v2 <- bind_rows(jan_2022,feb_2022,mar_2022,apr_2022

,may_2022,jun_2022,jul_2022)

summary(trips_data)

summary(trips_v2)

Adding data : Separating date columns and creating new data frame

# Adding ride length in seconds for 2021

trips_data$ride_length <- difftime(trips_data$ended_at, trips_data$started_at)

# Inspect the structure of columns for 2021

str(trips_data)

# checking if ride_length is a numeric value

is.numeric(trips_data$ride_length)

trips_data$ride_length <- as.numeric(as.character(trips_data$ride_length))

is.numeric(trips_data$ride_length)

#dataset for 2022

#inspecting table for 2022

colnames(trips_v2)

nrow(trips_v2)

dim(trips_v2)

head(trips_v2)

str(trips_v2)

summary(trips_v2)

unique(trips_v2$member_casual)

#Adding data separating date for 2022

trips_v2$date <- as.Date(trips_v2$started_at)

trips_v2$month <- format(as.Date(trips_v2$date), "%m")

trips_v2$day <- format(as.Date(trips_v2$date), "%d")

trips_v2$year <- format(as.Date(trips_v2$date), "%Y")

trips_v2$day_of_the_week <- format(as.Date(trips_v2$date), "%A")

#Adding ride length in seconds for 2022

trips_v2$ride_length <- difftime(trips_v2$ended_at, trips_v2$started_at)

#inspecting

str(trips_v2)

trips_v2$ride_length <- as.numeric(as.character(trips_v2$ride_length))

is.numeric(trips_v2$ride_length)

#Creating a new data

trips_data_v2 <- trips_data[!(trips_data$ride_length == 0 | trips_data$ride_length < 0),] #removing all rows with non

trips_v2_v2 <- trips_v2[!(trips_v2$ride_length == 0 | trips_v2$ride_length < 0),]

Creating data frames that have mean and max values

# create new data frame based on number of rides and average ride lengths for 2021

ride_trips <- trips_data_v2 %>%

mutate(weekday = wday(started_at, label = TRUE)) %>%

group_by(member_casual, weekday) %>%

summarise(number_of_rides = n()

,average_duration = mean(ride_length)) %>%

arrange(member_casual, weekday)

ride_trips_month <- trips_data_v2 %>%

group_by(member_casual, month) %>%

summarise(number_of_rides = n()

,average_duration = mean(ride_length)) %>%

arrange(member_casual, month)

# Writing a file in csv format for 2021

write.csv(ride_trips

, file = "C:/Users/bravo/OneDrive/Desktop/kevinssssssssssss/Capstone/2021_avg_ride_trips_trips.csv")

write.csv(ride_trips_month

, file = "C:/Users/bravo/OneDrive/Desktop/kevinssssssssssss/Capstone/2021_avg_month_ride_trips_trips.csv")

# create new data frame based on number of rides and average ride lengths for 2022

ride_trip_2022 <- trips_v2_v2 %>%

mutate(weekday = wday(started_at, label = TRUE)) %>%

group_by(member_casual, weekday) %>% #groups by usertype and weekday

summarise(number_of_rides = n()

,average_duration = mean(ride_length)) %>%

arrange(member_casual, weekday)

ride_trip_2022_month <- trips_v2_v2 %>%

group_by(member_casual, month) %>%

,average_duration = mean(ride_length)) %>%

arrange(member_casual, month)

# Writing a file in csv format for 2022

write.csv(ride_trip_2022

, file = "C:/Users/bravo/OneDrive/Desktop/kevinssssssssssss/Capstone/2022_avg_ride_trips_trips.csv")

write.csv(ride_trip_2022_month

, file = "C:/Users/bravo/OneDrive/Desktop/kevinssssssssssss/Capstone/2022_avg_month_ride_trips_trips.csv")

Share

I used Tableau and imported and used the data frames in csv format in order to make bar graphs that represented my finding that where filtered and cleaned in R.

Number of rides based on days of the week

Number of rides based on months of the year

Findings

The analysis demonstrates the difference between casual riders and annual members. In the data sets provided of a pattern from 2021 through 2022. Since the records are from January to July from 2022, we are able to compare and analyze. Records show that most bike rentals spike during summer. However, the increase of casual riders spikes significantly more than annual members. Casual riders surpass annual members starting June and decrease around September. Another finding is that on average casual riders rent more bikes during the weekends when comparing days of the week.

Suggestions

Based on the findings of the analysis there are a few suggestion to help turn casual riders into annual members.

Offer discounts to casual riders for upgrading from any kind of pass to an annual pass all summer long. Start announcing the discount at the beginning of May.
Start a new kind of pass, either summer pass or a half year pass.
Increase ads on fitness apps and music streaming apps. Reach out to influencers who are pro-earth and pro-health (especially during summer and weekends).

Page updated

Google Sites

Report abuse