CYCLISTIC BIKES
(Google Data Analytics Capstone Project)
(Google Data Analytics Capstone Project)
I'm a new junior data analyst at Cyclistic Company, working with the marketing analysis team. Cyclistic is a Chicago-based bike-share firm with over 5800 bicycles and 600 docking stations.
Lily Morino, my manager and head of marketing, is in charge of developing campaigns and initiatives to promote the bike share program. My team is in charge of collecting, analysing, and reporting data that will help steer Cyclistic's marketing approach. The Cyclistic executive team is in charge of approving the recommended marketing scheme.
The bike can be unlocked from one station and returned to any other station in the system at any time, according to Cyclistic's flexible plan for bike use. Riders who acquire annual memberships are referred to as Cyclistic members, whereas single ride passes and full day permits are referred to as casual riders. Finance Analyst concluded that annual memberships are far more profitable than casual riders. The Director of Marketing believes that increasing the number of yearly memberships is critical to the company's future success. That will also be the key to future growth, and there is a good chance of converting casual riders to Cyclistic members. That mean Morino has set a clear goal: design marketing strategies aimed at converting casual riders to Cyclistic members. By now my team wants to understand how casual riders and annual members use Cyclistic bikes differently, from these insights the marketing team will design a new marketing strategy to convert casual riders.
To achieve this goal, the marketing analytics team must focus on it and analyze the differences between annual members and casual riders, why casual riders would purchase a membership, and how digital media may alter marketing techniques. Morino and his colleagues want to analyze historical biker trip data to discover trends. Three questions will govern this future marketing program:
1- how do annual members and casual riders use cyclistic bikes differently?
2- why would casual riders buy cyclistic annual membership?
3- how can cyclistic use digital media to influence casual riders to become members?
Morino has handed me the first question, and this work is an opportunity for me to demonstrate my abilities as a young data analyst.
The team's objective is to design a successful marketing campaign focused for casual riders and to provide answers to all of the questions raised above. The insights gathered will aid in the team's mission and the company's profitability. My role is to help my team's mission by delivering to stakeholders' expectations and enabling them to make data-driven decisions.Â
My team is using historical data of last 12 months collected by the company (here). I created a new folder in my computer called "Cyclistic" to retrieve the necessary data. There are twelve zip files in total. The files have been unzipped, and each file is a csv file for a one-month period beginning in June 2021. The company data  is relevant, complete, comprehensive, current, and cited.
#checking my working directory
getwd()
setwd("C:/Users/sanjana/Cyclistic")
#setting up my environment
library(tidyverse)
library(anytime)
#importing data
X202106_divvy_tripdata <- read.csv("../input/bike-trips/202106-divvy-tripdata.csv")
X202107_divvy_tripdata <- read.csv("../input/bike-trips/202107-divvy-tripdata.csv")
X202108_divvy_tripdata <- read.csv("../input/bike-trips/202108-divvy-tripdata.csv")
X202109_divvy_tripdata <- read.csv("../input/bike-trips/202109-divvy-tripdata.csv")
X202110_divvy_tripdata <- read.csv("../input/bike-trips/202110-divvy-tripdata.csv")
X202111_divvy_tripdata <- read.csv("../input/bike-trips/202111-divvy-tripdata.csv")
X202112_divvy_tripdata <- read.csv("../input/bike-trips/202112-divvy-tripdata.csv")
X202201_divvy_tripdata <- read.csv("../input/bike-trips/202201-divvy-tripdata.csv")
X202202_divvy_tripdata <- read.csv("../input/bike-trips/202202-divvy-tripdata.csv")
X202203_divvy_tripdata <- read.csv("../input/bike-trips/202203-divvy-tripdata.csv")
X202204_divvy_tripdata <- read.csv("../input/bike-trips/202204-divvy-tripdata.csv")
X202205_divvy_tripdata <- read.csv("../input/bike-trips/202205-divvy-tripdata.csv")
#viewing data
View(X202106_divvy_tripdata) View(X202107_divvy_tripdata) View(X202108_divvy_tripdata) View(X202109_divvy_tripdata) View(X202110_divvy_tripdata) View(X202111_divvy_tripdata) View(X202112_divvy_tripdata) View(X202201_divvy_tripdata) View(X202202_divvy_tripdata) View(X202203_divvy_tripdata) View(X202204_divvy_tripdata) View(X202205_divvy_tripdata)
#checking for errors and consistency:
str(X202106_divvy_tripdata) str(X202107_divvy_tripdata) str(X202108_divvy_tripdata) str(X202109_divvy_tripdata) str(X202110_divvy_tripdata) str(X202111_divvy_tripdata) str(X202112_divvy_tripdata) str(X202201_divvy_tripdata) str(X202202_divvy_tripdata) str(X202203_divvy_tripdata) str(X202204_divvy_tripdata) str(X202205_divvy_tripdata)
#checking for the consistency of columns names:
colnames(X202106_divvy_tripdata) colnames(X202106_divvy_tripdata) colnames(X202107_divvy_tripdata) colnames(X202108_divvy_tripdata) colnames(X202109_divvy_tripdata) colnames(X202110_divvy_tripdata) colnames(X202111_divvy_tripdata) colnames(X202112_divvy_tripdata) colnames(X202201_divvy_tripdata) colnames(X202202_divvy_tripdata) colnames(X202203_divvy_tripdata) colnames(X202204_divvy_tripdata) colnames(X202205_divvy_tripdata)
#combining into a single file
dataset <-rbind(X202106_divvy_tripdata,X202107_divvy_tripdata,X202108_divvy_tripdata, X202109_divvy_tripdata,X202110_divvy_tripdata,X202111_divvy_tripdata,X202112_divvy_tripdata,X202201_divvy_tripdata,X202202_divvy_tripdata,X202203_divvy_tripdata,X202204_divvy_tripdata,X202205_divvy_tripdata)
#looking for duplicate rows
data <- dataset[!duplicated( dataset), ]
#diving dataset on the basis of different kind of members
casual_members_data <- filter(data, member_casual=="casual")
annual_members_data <- filter(data, member_casual=="member")
#viewing data
View(casual_members_data)
View(annual_members_data)
We have previously reviewed the data for irregularities, and now we will add new fields and prepare the data for analysis.
#converting started_at and ended_at datatype to date and time
casual_members_data$started_at <- anytime(casual_members_data$started_at)
casual_members_data$ended_at <- anytime(casual_members_data$ended_at)
annual_members_data$started_at <- anytime(annual_members_data$started_at)
annual_members_data$ended_at <- anytime(annual_members_data$ended_at)
#adding ride_length for each ride duration in mins
casual_members_data$ride_length <- difftime(casual_members_data$ended_at,casual_members_data$started_at, unit = "mins")
annual_members_data$ride_length <- difftime(annual_members_data$ended_at,annual_members_data$started_at, unit = "mins")
#removing bad data
good_casual_data <- casual_members_data[!(casual_members_data$start_station_name == "HQ QR" | casual_members_data$ride_length<0),]
good_annual_data <- annual_members_data[!(annual_members_data$start_station_name == "HQ QR" | annual_members_data$ride_length<0),]
#adding columns that list the date, month, day, year, weekday of each ride
good_casual_data$date <- as.Date(good_casual_data$started_at) #The default format is yyyy-mm-dd
good_casual_data$month <- format(as.Date(good_casual_data$date), "%m")
good_casual_data$day <- format(as.Date(good_casual_data$date), "%d")
good_casual_data$year <- format(as.Date(good_casual_data$date), "%Y")
good_casual_data$day_of_week <- format(as.Date(good_casual_data$date), "%A")
good_annual_data$date <- as.Date(good_annual_data$started_at) #The default format is yyyy-mm-dd
good_annual_data$month <- format(as.Date(good_annual_data$date), "%m")
good_annual_data$day <- format(as.Date(good_annual_data$date), "%d")
good_annual_data$year <- format(as.Date(good_annual_data$date), "%Y")
good_annual_data$day_of_week <- format(as.Date(good_annual_data$date), "%A")
#checking types of bikes
unique(good_casual_data$rideable_type)
unique(good_annual_data$rideable_type)
#descriptive analysis on ride_length
mean(good_casual_data$ride_length, na.rm = TRUE) #straight average (total ride length / rides)
median(good_casual_data$ride_length,na.rm = TRUE) #midpoint number in the ascending array of ride lengths
max(good_casual_data$ride_length, na.rm = TRUE) #longest ride
min(good_casual_data$ride_length, na.rm = TRUE) #shortest ride
mean(good_annual_data$ride_length, na.rm = TRUE) #straight average (total ride length / rides)
median(good_annual_data$ride_length,na.rm = TRUE) #midpoint number in the ascending array of ride lengths
max(good_annual_data$ride_length, na.rm = TRUE) #longest ride
min(good_annual_data$ride_length, na.rm = TRUE) #shortest ride
sum(good_casual_data$ride_length, na.rm = TRUE)
sum(good_annual_data$ride_length, na.rm = TRUE)
#Average ride_length for each day
good_casual_data$day_of_week <- ordered(good_casual_data$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(good_casual_data$ride_length ~ good_casual_data$member_casual + good_casual_data$day_of_week, FUN = mean)
good_annual_data$day_of_week <- ordered(good_annual_data$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(good_annual_data$ride_length ~ good_annual_data$member_casual + good_annual_data$day_of_week, FUN = mean)
#adding new field representing number of rides
mutate(good_casual_data, num_of_rides = n())
mutate(good_annual_data, num_of_rides = n())
#calculating number of rides and average duration for each weekday
good_casual_data %>%
group_by(day_of_week) %>%
summarise(num_of_rides = n(), avg_duration = mean(ride_length)) %>%
arrange(day_of_week)
good_annual_data %>%
group_by(day_of_week) %>%
summarise(num_of_rides = n(), avg_duration = mean(ride_length)) %>%
arrange(day_of_week)
#combing both datasets
good_data <- rbind(good_annual_data, good_casual_data)
View(good_data)
#Removing NAs
good_data <- na.omit(good_data)
#representing no. of rides and weekday for each member type
good_data %>%
group_by(member_casual, day_of_week) %>%
summarise(number_of_rides = n() ,average_duration = mean(ride_length)) %>%
arrange(member_casual, day_of_week) %>%
ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) + geom_col(position = "dodge")
#representing average duration of each member type on weekdays
good_data %>%
group_by(member_casual, day_of_week) %>%
summarise(number_of_rides = n(),average_duration = mean(ride_length)) %>%
arrange(member_casual, day_of_week) %>%
ggplot(aes(x = day_of_week, y = average_duration, fill = member_casual)) + geom_col(position = "dodge")
Annual members outnumber casual members.
The docked bikes are favored only by casual riders, while the other types of bikes are more commonly utilized by annual members.
Casual members travel longer distances by bike.
Tuesdays and Sundays are the busiest weekdays.
Casual members use Cyclistic bikes the most on Tuesdays, while yearly members ride them the most on Saturdays.
Plan for direct marketing campaigns to explain the benefits of Cyclistic annual memberships at the start and end of casual stations on Tuesdays.
Sending tailored emails to new and casual members emphasizing the benefits of annual membership and encouraging them to use it.
Use social media in Chicago to promote the benefits of annual subscriptions.