CYCLISTIC BIKES

(Google Data Analytics Capstone Project)

Introduction

I'm a new junior data analyst at Cyclistic Company, working with the marketing analysis team. Cyclistic is a Chicago-based bike-share firm with over 5800 bicycles and 600 docking stations.

Lily Morino, my manager and head of marketing, is in charge of developing campaigns and initiatives to promote the bike share program. My team is in charge of collecting, analysing, and reporting data that will help steer Cyclistic's marketing approach. The Cyclistic executive team is in charge of approving the recommended marketing scheme.

The bike can be unlocked from one station and returned to any other station in the system at any time, according to Cyclistic's flexible plan for bike use. Riders who acquire annual memberships are referred to as Cyclistic members, whereas single ride passes and full day permits are referred to as casual riders. Finance Analyst concluded that annual memberships are far more profitable than casual riders. The Director of Marketing believes that increasing the number of yearly memberships is critical to the company's future success. That will also be the key to future growth, and there is a good chance of converting casual riders to Cyclistic members. That mean Morino has set a clear goal: design marketing strategies aimed at converting casual riders to Cyclistic members. By now my team wants to understand how casual riders and annual members use Cyclistic bikes differently, from these insights the marketing team will design a new marketing strategy to convert casual riders.

To achieve this goal, the marketing analytics team must focus on it and analyze the differences between annual members and casual riders, why casual riders would purchase a membership, and how digital media may alter marketing techniques. Morino and his colleagues want to analyze historical biker trip data to discover trends. Three questions will govern this future marketing program:

1- how do annual members and casual riders use cyclistic bikes differently?

2- why would casual riders buy cyclistic annual membership?

3- how can cyclistic use digital media to influence casual riders to become members?

Morino has handed me the first question, and this work is an opportunity for me to demonstrate my abilities as a young data analyst.

Business Task

The team's objective is to design a successful marketing campaign focused for casual riders and to provide answers to all of the questions raised above. The insights gathered will aid in the team's mission and the company's profitability. My role is to help my team's mission by delivering to stakeholders' expectations and enabling them to make data-driven decisions.

Data Collection

My team is using historical data of last 12 months collected by the company (here). I created a new folder in my computer called "Cyclistic" to retrieve the necessary data. There are twelve zip files in total. The files have been unzipped, and each file is a csv file for a one-month period beginning in June 2021. The company data is relevant, complete, comprehensive, current, and cited.

#checking my working directory

getwd()

setwd("C:/Users/sanjana/Cyclistic")

#setting up my environment

library(tidyverse)

library(anytime)

#importing data

X202106_divvy_tripdata <- read.csv("../input/bike-trips/202106-divvy-tripdata.csv")

X202107_divvy_tripdata <- read.csv("../input/bike-trips/202107-divvy-tripdata.csv")

X202108_divvy_tripdata <- read.csv("../input/bike-trips/202108-divvy-tripdata.csv")

X202109_divvy_tripdata <- read.csv("../input/bike-trips/202109-divvy-tripdata.csv")

X202110_divvy_tripdata <- read.csv("../input/bike-trips/202110-divvy-tripdata.csv")

X202111_divvy_tripdata <- read.csv("../input/bike-trips/202111-divvy-tripdata.csv")

X202112_divvy_tripdata <- read.csv("../input/bike-trips/202112-divvy-tripdata.csv")

X202201_divvy_tripdata <- read.csv("../input/bike-trips/202201-divvy-tripdata.csv")

X202202_divvy_tripdata <- read.csv("../input/bike-trips/202202-divvy-tripdata.csv")

X202203_divvy_tripdata <- read.csv("../input/bike-trips/202203-divvy-tripdata.csv")

X202204_divvy_tripdata <- read.csv("../input/bike-trips/202204-divvy-tripdata.csv")

X202205_divvy_tripdata <- read.csv("../input/bike-trips/202205-divvy-tripdata.csv")

#viewing data

View(X202106_divvy_tripdata) View(X202107_divvy_tripdata) View(X202108_divvy_tripdata) View(X202109_divvy_tripdata) View(X202110_divvy_tripdata) View(X202111_divvy_tripdata) View(X202112_divvy_tripdata) View(X202201_divvy_tripdata) View(X202202_divvy_tripdata) View(X202203_divvy_tripdata) View(X202204_divvy_tripdata) View(X202205_divvy_tripdata)

#checking for errors and consistency:

str(X202106_divvy_tripdata) str(X202107_divvy_tripdata) str(X202108_divvy_tripdata) str(X202109_divvy_tripdata) str(X202110_divvy_tripdata) str(X202111_divvy_tripdata) str(X202112_divvy_tripdata) str(X202201_divvy_tripdata) str(X202202_divvy_tripdata) str(X202203_divvy_tripdata) str(X202204_divvy_tripdata) str(X202205_divvy_tripdata)

#checking for the consistency of columns names:

colnames(X202106_divvy_tripdata) colnames(X202106_divvy_tripdata) colnames(X202107_divvy_tripdata) colnames(X202108_divvy_tripdata) colnames(X202109_divvy_tripdata) colnames(X202110_divvy_tripdata) colnames(X202111_divvy_tripdata) colnames(X202112_divvy_tripdata) colnames(X202201_divvy_tripdata) colnames(X202202_divvy_tripdata) colnames(X202203_divvy_tripdata) colnames(X202204_divvy_tripdata) colnames(X202205_divvy_tripdata)

#combining into a single file

dataset <-rbind(X202106_divvy_tripdata,X202107_divvy_tripdata,X202108_divvy_tripdata, X202109_divvy_tripdata,X202110_divvy_tripdata,X202111_divvy_tripdata,X202112_divvy_tripdata,X202201_divvy_tripdata,X202202_divvy_tripdata,X202203_divvy_tripdata,X202204_divvy_tripdata,X202205_divvy_tripdata)

#looking for duplicate rows

data <- dataset[!duplicated( dataset), ]

#diving dataset on the basis of different kind of members

casual_members_data <- filter(data, member_casual=="casual")

annual_members_data <- filter(data, member_casual=="member")

#viewing data

View(casual_members_data)

View(annual_members_data)

Data Cleaning

We have previously reviewed the data for irregularities, and now we will add new fields and prepare the data for analysis.

#converting started_at and ended_at datatype to date and time

casual_members_data$started_at <- anytime(casual_members_data$started_at)

casual_members_data$ended_at <- anytime(casual_members_data$ended_at)

annual_members_data$started_at <- anytime(annual_members_data$started_at)

annual_members_data$ended_at <- anytime(annual_members_data$ended_at)

#adding ride_length for each ride duration in mins

casual_members_data$ride_length <- difftime(casual_members_data$ended_at,casual_members_data$started_at, unit = "mins")

annual_members_data$ride_length <- difftime(annual_members_data$ended_at,annual_members_data$started_at, unit = "mins")

#removing bad data

good_casual_data <- casual_members_data[!(casual_members_data$start_station_name == "HQ QR" | casual_members_data$ride_length<0),]

good_annual_data <- annual_members_data[!(annual_members_data$start_station_name == "HQ QR" | annual_members_data$ride_length<0),]

#adding columns that list the date, month, day, year, weekday of each ride

good_casual_data$date <- as.Date(good_casual_data$started_at) #The default format is yyyy-mm-dd

good_casual_data$month <- format(as.Date(good_casual_data$date), "%m")

good_casual_data$day <- format(as.Date(good_casual_data$date), "%d")

good_casual_data$year <- format(as.Date(good_casual_data$date), "%Y")

good_casual_data$day_of_week <- format(as.Date(good_casual_data$date), "%A")

good_annual_data$date <- as.Date(good_annual_data$started_at) #The default format is yyyy-mm-dd

good_annual_data$month <- format(as.Date(good_annual_data$date), "%m")

good_annual_data$day <- format(as.Date(good_annual_data$date), "%d")

good_annual_data$year <- format(as.Date(good_annual_data$date), "%Y")

good_annual_data$day_of_week <- format(as.Date(good_annual_data$date), "%A")

Data Analysis

#checking types of bikes

unique(good_casual_data$rideable_type)

unique(good_annual_data$rideable_type)

#descriptive analysis on ride_length

mean(good_casual_data$ride_length, na.rm = TRUE) #straight average (total ride length / rides)

median(good_casual_data$ride_length,na.rm = TRUE) #midpoint number in the ascending array of ride lengths

max(good_casual_data$ride_length, na.rm = TRUE) #longest ride

min(good_casual_data$ride_length, na.rm = TRUE) #shortest ride

mean(good_annual_data$ride_length, na.rm = TRUE) #straight average (total ride length / rides)

median(good_annual_data$ride_length,na.rm = TRUE) #midpoint number in the ascending array of ride lengths

max(good_annual_data$ride_length, na.rm = TRUE) #longest ride

min(good_annual_data$ride_length, na.rm = TRUE) #shortest ride

sum(good_casual_data$ride_length, na.rm = TRUE)

sum(good_annual_data$ride_length, na.rm = TRUE)

#Average ride_length for each day

good_casual_data$day_of_week <- ordered(good_casual_data$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

aggregate(good_casual_data$ride_length ~ good_casual_data$member_casual + good_casual_data$day_of_week, FUN = mean)

good_annual_data$day_of_week <- ordered(good_annual_data$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

aggregate(good_annual_data$ride_length ~ good_annual_data$member_casual + good_annual_data$day_of_week, FUN = mean)

#adding new field representing number of rides

mutate(good_casual_data, num_of_rides = n())

mutate(good_annual_data, num_of_rides = n())

#calculating number of rides and average duration for each weekday

good_casual_data %>%

group_by(day_of_week) %>%

summarise(num_of_rides = n(), avg_duration = mean(ride_length)) %>%

arrange(day_of_week)

good_annual_data %>%

group_by(day_of_week) %>%

summarise(num_of_rides = n(), avg_duration = mean(ride_length)) %>%

arrange(day_of_week)

Data Visualization

#combing both datasets

good_data <- rbind(good_annual_data, good_casual_data)

View(good_data)

#Removing NAs

good_data <- na.omit(good_data)

#representing no. of rides and weekday for each member type

good_data %>%

group_by(member_casual, day_of_week) %>%

summarise(number_of_rides = n() ,average_duration = mean(ride_length)) %>%

arrange(member_casual, day_of_week) %>%

ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) + geom_col(position = "dodge")

#representing average duration of each member type on weekdays

good_data %>%

group_by(member_casual, day_of_week) %>%

summarise(number_of_rides = n(),average_duration = mean(ride_length)) %>%

arrange(member_casual, day_of_week) %>%

ggplot(aes(x = day_of_week, y = average_duration, fill = member_casual)) + geom_col(position = "dodge")

Insights & Recommendations

Insights

Annual members outnumber casual members.
The docked bikes are favored only by casual riders, while the other types of bikes are more commonly utilized by annual members.
Casual members travel longer distances by bike.
Tuesdays and Sundays are the busiest weekdays.
Casual members use Cyclistic bikes the most on Tuesdays, while yearly members ride them the most on Saturdays.

Recommendations

Plan for direct marketing campaigns to explain the benefits of Cyclistic annual memberships at the start and end of casual stations on Tuesdays.
Sending tailored emails to new and casual members emphasizing the benefits of annual membership and encouraging them to use it.
Use social media in Chicago to promote the benefits of annual subscriptions.

Page updated

Google Sites

Report abuse