Case Study 1

How Does a Bike-Share Navigate Speedy Success?

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximising the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclists bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclists executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualisations.
This is my presentation: Presentation: How Does a Bike-Share Navigate Speedy Success? - Nithika Pidikiti

Characters and teams

Cyclists: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
Cyclists marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclists marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
Cyclists executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

About the company

In 2016, Cyclists launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Until now, Cyclistic's marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic's finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

1. Ask

Client/Sponsor:
Cyclists Marketing Team

Purpose:
Cyclists is focused on increasing its profitability by boosting the number of annual memberships. To achieve this goal, it is essential to understand how annual members and casual riders differ in their use of Cyclists bikes. By analysing historical bike trip data, the marketing team aims to identify distinct usage patterns between these two groups. This understanding will guide the design of targeted marketing strategies aimed at converting casual riders into annual members. The insights gained from this analysis will be critical for presenting actionable recommendations to the Cyclists executive team, who will make the final decision on whether to implement the proposed marketing strategy. 10The primary problem is to discern how annual members and casual riders differ in their bike usage. Addressing this issue will enable the development of marketing strategies tailored to increase annual memberships. Insights from this analysis will reveal specific behaviours and preferences of each rider type, which can be used to craft targeted marketing campaigns and promotions that resonate with casual riders and encourage them to opt for annual memberships.

Scope / Major Project Activities:

Data Collection - Collect historical bike trip data to understand usage patterns of annual members and casual riders.
Data Analysis - Analyse the data to identify differences in trip frequency, duration, time of day, and location.
Insights and Findings - Extract key insights regarding the behaviours and preferences of each rider type.
Recommendation - Develop targeted marketing strategies based on the analysis to convert casual riders to annual members.
Final Report - Compile a comprehensive report detailing the differences in bike usage, supported by data visualisations and actionable recommendations.

This project does not include:

Implementing any marketing strategies or campaigns.
Analysis of bike usage data beyond the historical scope provided.

Deliverables:

Usage Pattern Analysis - Detailed analysis of how annual members and casual riders use Cyclistic bikes differently.
Insights Report - Key findings regarding the behaviours and preferences of each rider type.
Marketing Recommendations - Targeted strategies to convert casual riders into annual members based on the data.
Final Report - Comprehensive report including data-driven insights and visualisations for executive review.

Estimated date of Completion:

October 10, 2024

2. Prepare

Download the Cyclistic trip data here. I have used data from the period of July 9th, 2024 (03:17:18AM) to Sep 5th, 2024 (10:20:26 AM) since this data is the latest data available (this project is being done in 2024). The data is organized in a CSV file, structured into 13 columns representing each month. This dataset, provided by Motivate International Inc. under this license. The dataset contains no personal information about the riders, ensuring privacy and compliance with data protection standards.

To verify the data’s integrity, all columns were checked for consistency and proper data types. The dataset is highly suitable for the case study as it aligns with the business questions being addressed. However, there are some limitations, particularly with missing information regarding station names and station IDs. Despite this, the dataset remains adequate for analysis and supports the objectives of this case study. There are 3 CSV files in total.

It is structured data, organised in rows (records) and columns (fields). Each record represents one trip, and each trip has a unique field that identifies it: ride_id. Each trip is anonymised and includes the following fields:

* ride_id #Ride id - unique

* rideable_type #Bike type - Classic, Docked, Electric

* started_at #Trip start day and time

* ended_at #Trip end day and time

* start_station_name #Trip start station

* start_station_id #Trip start station id

* end_station_name #Trip end station

* end_station_id #Trip end station id

* start_lat #Trip start latitude

* start_lng #Trip start longitute

* end_lat #Trip end latitude

* end_lat #Trip end longitude

* member_casual #Rider type - Member or Casual

3. Process

For this project, I utilised Excel for initial data manipulation and R for more complex statistical analyses and creating visualisations and Tableau for visualisations too. R is particularly effective for conducting in-depth analysis, generating insights, and producing dynamic visualisations, while Excel facilitated quick visualisations and preliminary exploration of the data. To maintain data integrity, I conducted thorough checks for consistency and correctness across the dataset.

The following steps were taken to ensure the data is clean and ready for analysis:

Data validation was performed to identify potential errors.
Null data was highlighted using conditional formatting for further investigation.
Mistyped words and numbers were corrected through systematic checks.
Extra spaces and characters were removed using the trim function.
Duplicate values were handled with the distinct function.
Mismatched data types were corrected to align with the required formats.
Inconsistent strings were standardised for consistency.
Date formats were unified across the dataset.
Misleading variable labels were clarified to ensure accurate interpretation.
Truncated data and other inconsistencies were resolved.

I have documented the entire process (including cleaning) within R in the analyse section, enabling a transparent review and the ability to share these results when needed. This thorough documentation allows for easy tracking and verification of all the steps taken to prepare the data for analysis.

EXCEL PROCESS - Initial data cleaning and manipulation

Our next step is to ensure the data is stored appropriately and prepared for analysis. To achieve this, I downloaded all 3 zip files, unzipped them, and created a temporary folder on my desktop to house the files. I then organised the files into subfolders for .CSV files and .XLS files, maintaining a copy of the original data. Next, I launched Excel and opened each file, saving each one as an Excel Workbook file (.xlsx) to ensure compatibility and ease of analysis. For each of the 3 .XLS file, I performed the following operations:

Changed format of started_at and ended_at columns
- Formatted as custom DATETIME
- Format > Cells > Custom > dd/mm/yyyy h:mm:ss

Created a column called ride_length
- Calculated the length of each ride by subtracting the column started_at from the column ended_at - using the formula = [@[ended_at]] - [@[started_at]]
- Formatted as TIME
- Format > Cells > Time > HH:MM:SS (13:30:55)

Created a column called day_of_week
- Calculated the day of the week that each ride started using the WEEKDAY command (example: =WEEKDAY(D2,1))
- Formatted as a NUMBER with no decimals
- Format > Cells > Number (no decimals) > 1,2,3,4,5,6,7
- Note: 1 = Sunday and 7 = Saturday

FINISHED EXCEL SHEETS SMALL PREVIEW

4. Analyse

To arrive at these insights, the following steps were performed in R:

Step 1: Install and Load Required Libraries

# Install required packages

install.packages("tidyverse")

install.packages("readxl")

install.packages("lubridate")

# Load the libraries

library(tidyverse)

library(readxl)

library(lubridate)

Step 2: Import Data from Excel Files

# Load the Excel datasets

data_august <- read_excel("~/Desktop/CASE STUDY 1 /3/untitled folder/AUGUST DATA - NEETHU.xlsx")

data_july <- read_excel("~/Desktop/CASE STUDY 1 /3/JUY/Neethuv1 JULY(AutoRecovered).xlsx")

data_september <- read_excel("~/Desktop/CASE STUDY 1 /3/SEP/Sep 2024 neethu data.xlsx")

# Verify that data has been imported correctly

print(head(data_august)) # Check first few rows of month 1 data

print(head(data_july)) # Check first few rows of month 2 data

print(head(data_september)) # Check first few rows of month 3 data

Step 3: Inspect and Clean the Data

Inspect Column Names and Data Structure

# Check column names for each file

colnames(data_august)

colnames(data_july)

colnames(data_september)

# Inspect the structure of each file

str(data_august)

str(data_july)

str(data_september)

Rename Columns for Consistency (if necessary)

# Standardize column names across all datasets

# For August data, no renaming needed except for "Ride_length" and "day_of_week"

data_august <- rename(data_august,

ride_length = "Ride_length", # Ensure consistent lowercase

weekday = "day_of_week") # Match with September's Weekday

# For July data, renaming "Source.Name", "Ride_length", and "day_of_week"

data_july <- rename(data_july,

ride_length = "Ride_length", # Ensure consistent lowercase

weekday = "day_of_week", # Match with September's Weekday

source_name = "Source.Name") # Renaming Source.Name

# For September data, rename "Weekday" to match other files

data_september <- rename(data_september,

day_of_week = "Weekday") # Match "Weekday" to "day_of_week"

#verification of changes

colnames(data_august)

colnames(data_july)

colnames(data_september)

More Conversions

library(dplyr)

library(lubridate) # For date-time conversion

Convert data time columns:
# Convert date-time columns in data_august

data_august <- data_august %>%

mutate(

started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"),

ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC")

)

# Convert date-time columns in data_july

data_july <- data_july %>%

mutate(

started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"),

ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC")

)

# Convert date-time columns in data_september

data_september <- data_september %>%

mutate(

started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"),

ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC")

)

Convert end_station_id Columns: Convert the end_station_id column in each dataset to character type

# Convert end_station_id to character in data_august

data_august <- data_august %>%

mutate(end_station_id = as.character(end_station_id))

# Convert end_station_id to character in data_july

data_july <- data_july %>%

mutate(end_station_id = as.character(end_station_id))

# Convert end_station_id to character in data_september

data_september <- data_september %>%

mutate(end_station_id = as.character(end_station_id))

Convert start_lat Columns: Convert the start_lat column in each dataset to double type for consistency

# Convert start_lat to double in data_august

data_august <- data_august %>%

mutate(start_lat = as.double(start_lat))

# Convert start_lat to double in data_july

data_july <- data_july %>%

mutate(start_lat = as.double(start_lat))

# Convert start_lat to double in data_september

data_september <- data_september %>%

mutate(start_lat = as.double(start_lat))

# Convert start_lng to double in all datasets

data_august <- data_august %>%

mutate(start_lng = as.double(start_lng))

data_july <- data_july %>%

mutate(start_lng = as.double(start_lng))

data_september <- data_september %>%

mutate(start_lng = as.double(start_lng))

# Convert end_lat to double in all datasets

data_august <- data_august %>%

mutate(end_lat = as.double(end_lat))

data_july <- data_july %>%

mutate(end_lat = as.double(end_lat))

data_september <- data_september %>%

mutate(end_lat = as.double(end_lat))

# Convert end_lng to double in all datasets

data_august <- data_august %>%

mutate(end_lng = as.double(end_lng))

data_july <- data_july %>%

mutate(end_lng = as.double(end_lng))

data_september <- data_september %>%

mutate(end_lng = as.double(end_lng))

Step 4: Combine the data sets

# Combine the three datasets into one dataframe

all_trips <- bind_rows(data_august, data_july, data_september)

# Check the structure of the combined dataframe

str(all_trips)

# Optionally, view the first few rows

head(all_trips)

Step 5: Clean and Process the Data

Remove unnecessary columns

# Remove unnecessary columns

all_trips <- all_trips %>%

select(-c(start_lat, start_lng, end_lat, end_lng))

Handle missing or incorrect Data

# Remove rows with missing start/end times

all_trips <- all_trips %>%

filter(!is.na(started_at) & !is.na(ended_at)) # Ensure there are no missing start/end times

# Convert ride_id to character (if it's not already)

all_trips <- all_trips %>%

mutate(ride_id = as.character(ride_id))

# Standardize "member_casual" column values

all_trips <- all_trips %>%

mutate(member_casual = recode(member_casual, "Subscriber" = "member", "Customer" = "casual"))

Add Date-Related Columns

# Ensure 'started_at' is in POSIXct format

all_trips$started_at <- as.POSIXct(all_trips$started_at)

# Add new columns for date, month, day, year, and day of the week

all_trips$date <- as.Date(all_trips$started_at) # Extract date

all_trips$month <- format(all_trips$date, "%m") # Extract month

all_trips$day <- format(all_trips$date, "%d") # Extract day

all_trips$year <- format(all_trips$date, "%Y") # Extract year

all_trips$day_of_week <- format(all_trips$date, "%A") # Extract day of the week

Step 6: Calculate Ride Length

# Calculate the ride length in seconds

all_trips$ride_length <- difftime(all_trips$ended_at, all_trips$started_at, units = "secs")

# Convert ride_length to numeric (if necessary)

all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))

# Remove rides with negative durations

all_trips <- all_trips %>% filter(ride_length > 0)

Step 7: Conduct descriptive Analysis

Summary Statistics for Ride Length

# Calculate basic summary statistics for ride length

mean_ride_length <- mean(all_trips$ride_length, na.rm = TRUE)

median_ride_length <- median(all_trips$ride_length, na.rm = TRUE)

max_ride_length <- max(all_trips$ride_length, na.rm = TRUE)

min_ride_length <- min(all_trips$ride_length, na.rm = TRUE)

# Display the summary statistics

mean_ride_length

median_ride_length

max_ride_length

min_ride_length

Grouped Analysis by User Type

# Summary statistics by user type (member vs casual)

user_summary <- all_trips %>%

group_by(member_casual) %>%

summarise(

mean_ride_length = mean(ride_length, na.rm = TRUE),

median_ride_length = median(ride_length, na.rm = TRUE),

max_ride_length = max(ride_length, na.rm = TRUE),

min_ride_length = min(ride_length, na.rm = TRUE)

)

# Display user_summary

print(user_summary)

Step 8: Visualization

install.packages("munsell")

install.packages("ggplot2")

packageVersion("ggplot2")

install.packages("farver")

Average Ride Length by User Type

# Plot the average ride length by user type

ggplot(all_trips, aes(x = member_casual, y = ride_length, fill = member_casual)) +

geom_bar(stat = "summary", fun = "mean") +

labs(title = "Average Ride Length by User Type", x = "User Type", y = "Average Ride Length (secs)")

Number of Rides by Day of the Week

# Plot the number of rides by day of the week

ggplot(all_trips, aes(x = day_of_week, fill = member_casual)) +

geom_bar() +

labs(title = "Number of Rides by Day of the Week", x = "Day of Week", y = "Number of Rides")

Histogram of Ride Lengths

# Example: Create all_trips_v2 from an existing data frame

all_trips_v2 <- all_trips # Replace with your actual data processing steps

# Load required libraries

library(dplyr)

library(ggplot2)

library(lubridate)

# Load your data (adjust the file path and method as needed)

all_trips <- read.csv("path/to/your/data.csv") # Update with your actual file path

# Ensure you process your data if necessary

# For example, if you are filtering or mutating:

all_trips_v2 <- all_trips %>%

mutate(ride_length = as.numeric(difftime(ended_at, started_at, units = "secs")),

day_of_week = weekdays(started_at)) # Create the required columns

# Check if all_trips_v2 is created successfully

print(head(all_trips_v2)) # View the first few rows

# Now run the ggplot commands

ggplot(all_trips_v2, aes(x = ride_length)) +

geom_histogram(binwidth = 60, fill = "blue", color = "black", alpha = 0.7) +

labs(title = "Distribution of Ride Lengths", x = "Ride Length (seconds)", y = "Number of Rides") +

theme_minimal()

Boxplot of Ride Lengths by Member Type

# Check the structure of all_trips_v2

str(all_trips_v2)

# View the first few rows of your data frame

head(all_trips_v2)

# Ensure ride_length is numeric

all_trips_v2$ride_length <- as.numeric(all_trips_v2$ride_length)

# Convert member_casual to a factor if not already

all_trips_v2$member_casual <- as.factor(all_trips_v2$member_casual)

library(ggplot2)

# Create a boxplot for ride_length by member_casual

ggplot(all_trips_v2, aes(x = member_casual, y = ride_length, fill = member_casual)) +

geom_boxplot() +

labs(title = "Ride Length Distribution by User Type", x = "User Type", y = "Ride Length (seconds)") +

theme_minimal()

# Check for NA values

sum(is.na(all_trips_v2$ride_length))

# Optionally remove rows with NA in ride_length

all_trips_v2 <- na.omit(all_trips_v2)

summary(all_trips_v2$ride_length)

Average Ride Duration Over Days of the Week

library(dplyr)

library(ggplot2)

library(lubridate) # Ensure you have lubridate loaded for wday()

# Create the line plot for average ride duration by day of the week

all_trips_v2 %>%

mutate(weekday = wday(started_at, label = TRUE)) %>%

group_by(member_casual, weekday) %>%

summarise(average_duration = mean(ride_length), .groups = "drop") %>%

ggplot(aes(x = weekday, y = average_duration, color = member_casual, group = member_casual)) +

geom_line(linewidth = 1) + # Updated from size to linewidth

geom_point(size = 3) +

labs(title = "Average Ride Duration by Day of the Week", x = "Day of Week", y = "Average Duration (seconds)") +

theme_minimal() +

scale_y_continuous(labels = scales::comma)

Ride Count by Weekday and Member Type

# Ride Count by Weekday and Member Type (Stacked Bar Plot)

all_trips_v2 %>%

mutate(weekday = wday(started_at, label = TRUE)) %>%

group_by(member_casual, weekday) %>%

summarise(number_of_rides = n()) %>%

ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +

geom_bar(stat = "identity") +

labs(title = "Total Number of Rides by User Type and Day of the Week", x = "Day of Week", y = "Number of Rides") +

theme_minimal()

Average Ride Length for Each User Type Over Time

# Average Ride Length Over Time

all_trips_v2 %>%

mutate(date = as.Date(started_at)) %>% # Extract date from datetime

group_by(date, member_casual) %>%

summarise(average_duration = mean(ride_length)) %>%

ggplot(aes(x = date, y = average_duration, color = member_casual)) +

geom_line() +

labs(title = "Average Ride Length Over Time by User Type", x = "Date", y = "Average Ride Length (seconds)") +

theme_minimal()

Annual Members vs Casual Riders (Pie chart)

library(dplyr)

library(ggplot2)

# Summarize data: Count of rides by user type and calculate percentage

percentage_data <- all_trips_v2 %>%

group_by(member_casual) %>%

summarise(number_of_rides = n()) %>%

mutate(percentage = (number_of_rides / sum(number_of_rides)) * 100) # Calculate percentage

# Create pie chart for percentage of rides by user type

ggplot(percentage_data, aes(x = "", y = percentage, fill = member_casual)) +

geom_bar(stat = "identity", width = 1) + # Create the pie chart

coord_polar("y") + # Convert to polar coordinates for pie chart

labs(title = "Annual Members vs Casual Riders",

fill = "User Type",

y = "Percentage of Rides") +

theme_void() + # Use a void theme for a cleaner look

theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold")) + # Center title

geom_text(aes(label = paste0(round(percentage, 1), "%")),

position = position_stack(vjust = 0.5),

color = "white") # Add percentage labels to the pie slices

Quarterly Trip Totals (July, August, September)

# Filter data for Q3 (July, August, September)

q3_data <- all_trips_v2 %>%

filter(month(started_at) %in% c(7, 8, 9))

# Summarize total rides for members vs casual riders in Q3

q3_totals <- q3_data %>%

group_by(member_casual) %>%

summarise(total_rides = n())

# Visualize total rides by user type

ggplot(q3_totals, aes(x = member_casual, y = total_rides, fill = member_casual)) +

geom_bar(stat = "identity") +

labs(title = "Total Rides in Q3 (July, August, September)", x = "User Type", y = "Total Rides") +

theme_minimal() +

scale_y_continuous(labels = scales::comma)

Median Ride Length by User Type

library(dplyr)

library(ggplot2)

library(lubridate) # Make sure to load lubridate for month() function

# Check if all_trips_v2 is loaded

if (exists("all_trips_v2")) {

# Filter data for Q3 (July, August, September)

q3_data <- all_trips_v2 %>%

filter(month(started_at) %in% c(7, 8, 9))

# Check if filtered data is valid

if (nrow(q3_data) > 0) {

# Calculate median ride length for members and casual riders

q3_median_ride_length <- q3_data %>%

group_by(member_casual) %>%

summarise(median_ride_length = median(ride_length, na.rm = TRUE)) # Remove NA values

# Visualize median ride length for members vs casual riders

ggplot(q3_median_ride_length, aes(x = member_casual, y = median_ride_length, fill = member_casual)) +

geom_bar(stat = "identity") +

labs(title = "Median Cyclist Bike Ride Length", x = "User Type", y = "Median Ride Length (seconds)") +

theme_minimal() +

scale_y_continuous(labels = scales::comma) # Format y-axis with commas

} else {

print("Filtered data for Q3 is empty.")

}

} else {

print("Dataset 'all_trips_v2' is not loaded.")

}

Ride Duration by Day of the Week

# Add a weekday column

q3_data <- q3_data %>%

mutate(weekday = wday(started_at, label = TRUE))

# Summarize median ride length by day of week and user type

q3_ride_duration_by_day <- q3_data %>%

group_by(member_casual, weekday) %>%

summarise(median_ride_length = median(ride_length, na.rm = TRUE), .groups = 'drop') # Add .groups argument to avoid warning

# Visualize median ride length by day of the week

ggplot(q3_ride_duration_by_day, aes(x = weekday, y = median_ride_length, color = member_casual, group = member_casual)) +

geom_line(linewidth = 1) +

geom_point(size = 3) +

labs(title = "Median Ride Length by Day of Week", x = "Day of Week", y = "Median Ride Length (seconds)") + # Updated title

theme_minimal() +

scale_y_continuous(labels = scales::comma)

Bike Type Preferences

# Summarize bike type preferences by user type

q3_bike_type_pref <- q3_data %>%

group_by(member_casual, rideable_type) %>%

summarise(total_rides = n(), .groups = 'drop') # Added .groups argument to avoid warning

# Visualize bike type preferences by user type

ggplot(q3_bike_type_pref, aes(x = rideable_type, y = total_rides, fill = member_casual)) +

geom_bar(stat = "identity", position = "dodge") +

labs(title = "Bike Type Preferences", x = "Bike Type", y = "Total Rides") + # Updated title

theme_minimal() +

scale_y_continuous(labels = scales::comma)

Total Rides by Day of the Week (Casual Riders vs Annual Members)

# Summarize total rides by day of week and user type

q3_rides_by_day <- q3_data %>%

group_by(member_casual, weekday) %>%

summarise(total_rides = n(), .groups = 'drop') # Added .groups argument to avoid warning

# Visualize total rides by day of the week

ggplot(q3_rides_by_day, aes(x = weekday, y = total_rides, fill = member_casual)) +

geom_col(position = "dodge") +

labs(title = "Total Rides by the Day of the Week", x = "Day of Week", y = "Total Rides") + # Updated title

theme_minimal() +

scale_y_continuous(labels = scales::comma)

Small snippets of my code in R

After cleaning and creating visualisations with the data, the next step was to analyse the distinct usage patterns of Cyclists bikes between annual members and casual riders using R. This analysis included various visualisations and statistical assessments to understand user behaviour comprehensively. Key metrics examined were summary statistics for ride length, grouped analysis by user type, and average ride length comparisons. Additionally, I explored the number of rides by day of the week, along with histograms and box-plots of ride lengths categorised by member type. To further delve into usage trends, I analysed average ride duration across the week, ride counts segmented by weekday and user type, and average ride lengths over time. Visual representations such as a pie chart comparing annual members to casual riders, quarterly trip totals for July, August, and September, median ride lengths by user type, ride duration by day of the week, bike type preferences, and total rides by day of the week for both user types were also included. Additionally I have a visualisation made on Tableau Public I created: https://public.tableau.com/shared/6DSN67Y6T?:display_count=n&:origin=viz_share_link

My insights and visualisations are shown below:

Annual Members vs Casual Riders

During these 3 months, members accounted for 56.5% of Cyclists total trips while casual riders accounted for 43.5% of total trips. However the percentage fluctutates through these 3 months.

Average Ride Length by User Type

Data Interpretation:

Member Users: The average ride length for member users is significantly shorter, estimated to be around 900 seconds (15 minutes).
Casual Users: The average ride length for casual users is significantly longer, estimated to be around 1650 seconds (27.5 minutes).

Casual riders have longer rides, not shorter ones. This is consistent with the idea that casual riders likely use the service for leisure trips, which often take more time. This suggests that casual users might be using the bike-sharing service for shorter distances, perhaps for quick errands or leisure activities, while member users are more likely to use the service for longer trips or commute purposes.Key Observations and Implications:

Casual Users Have Longer Rides: As mentioned above, casual typically have longer rides.
Member Users Have Shorter Rides: Member users tend to have shorter rides.
Pricing Strategy: A tiered pricing model could be beneficial, with discounts for members on longer rides and potentially higher rates for casual users on longer trips.
Service Optimisation: Understanding these differences can help optimize services for both user types, such as providing more docking stations in areas popular with members or improving convenience for casual users.
Marketing and Promotions: Targeted campaigns can attract and retain users based on their preferences.

Number of Rides by Day of the Week

Key Observations:

Days with Most Rides:
- Saturday has the highest number of total rides, followed by Friday and Sunday.
- These days show a higher proportion of casual riders compared to weekdays.
Rider Type Breakdown:
- Casual riders (red) dominate during weekends (Saturday and Sunday), significantly contributing to the high number of rides on those days.
- Member riders (blue) maintain a relatively consistent number of rides throughout the week, although their numbers are slightly higher on weekdays.
Days with Fewest Rides:
- Tuesday and Thursday have the fewest total rides, with fewer casual riders compared to the other days.
Ride Patterns:
- Casual riders seem to prefer riding during weekends and Fridays, likely indicating leisure or non-commuting purposes.
- Members are more consistent across all days, suggesting regular commuting or habitual usage.

Insights:

Casual riders likely use the service for leisure, peaking on weekends.
Members, on the other hand, use the service more consistently across the week, indicating they might rely on it for commuting or other daily activities.

Average Ride Duration by Day of the Week

Key Observations:

Casual Riders (Red Line):
- Weekends (Sunday and Saturday) have the longest average ride durations, especially Saturday, where the average ride duration peaks at around 1,750 seconds (approximately 29 minutes).
- Ride duration drops significantly on Monday and Tuesday, with Tuesday showing the lowest average ride time (around 1,250 seconds, or 20 minutes).
- From Wednesday onwards, casual riders' average ride time gradually increases, peaking again on Saturday.
Member Riders (Blue Line):
- Members have a much lower and more stable average ride duration compared to casual riders, ranging between 700 and 1,000 seconds (approximately 12 to 17 minutes) throughout the week.
- The average ride duration for members is lowest on Monday and Tuesday and increases slightly towards Friday and Saturday, though the variation is not as pronounced as for casual riders.
Contrasting Patterns:
- Casual riders take longer rides on weekends, which aligns with the trend seen in the previous chart where casual riders have higher activity on weekends. These longer rides suggest leisurely use, possibly for exploration or recreation.
- Members consistently take shorter rides, which might indicate they use the service for commuting or shorter, more routine trips throughout the week.

Insights:

Casual riders' average ride duration spikes on weekends, suggesting they engage in longer, more leisurely rides, possibly for recreational purposes.
Member riders' average ride durations are shorter and more consistent, indicating they likely use the service for shorter trips, perhaps for commuting or regular travel during the week.

Median Ride Length by User Type

Key Observations:

Casual Riders (Red Bar):
- The median ride length for casual riders is significantly higher than that of members. It is approximately 750 seconds (12.5 minutes).
- This indicates that casual riders, on average, take longer rides compared to members.
Member Riders (Blue Bar):
- The median ride length for member riders is lower, at around 500 seconds (8.3 minutes).
- This suggests that members typically take shorter rides, likely for practical purposes such as commuting or short-distance travel.

Insights:

Casual riders tend to have longer ride durations, which could imply that they are more likely to use the service for recreational purposes or infrequent, but longer, trips.
Members have shorter median ride times, suggesting they might be using the service for quick, routine trips, likely as part of daily commutes or regular travel patterns.

This data aligns with the trends observed in the previous charts, where casual riders took longer rides, especially on weekends, while members had more consistent and shorter ride durations throughout the week.

Median Ride Length by Day of Week

Key Observations:

Casual Riders:
- Casual riders have the longest median ride lengths on Saturdays, reaching over 1,100 seconds.
- Their ride lengths drop significantly on weekdays, with the shortest durations on Tuesdays and Wednesdays(around 800 seconds).
- A gradual increase in ride lengths is observed from Thursday to Saturday.
Members:
- Members exhibit shorter and more consistent ride durations throughout the week.
- Their median ride lengths remain around 600 seconds across most days.
- Saturday sees a slight increase in ride time for members, reaching about 650 seconds, while Monday has the shortest median ride length at 550 seconds.
Comparing Casual vs. Member Riders:
- Casual riders consistently have longer ride lengths compared to members across all days.
- Casual riders show greater variability in ride length throughout the week, while members’ ride lengths remain relatively stable.

Insights:

Weekend Preferences:
- Both casual riders and members prefer longer rides on weekends, particularly on Saturdays. This may indicate that weekends are used for leisure rides or outings.
- Casual riders, in particular, show a significant increase in ride length during the weekends, suggesting they might be more interested in longer, recreational rides during these times.
Stable Ride Behavior of Members:
- The consistent ride lengths of members throughout the week suggest that they may be using the service more for regular commutes or short trips, such as for work or daily activities.
Opportunity for Promotions:
- With casual riders being more active on weekends, targeted promotions, such as weekend discounts or leisure ride packages, could encourage even longer rides.
- For members, focusing on weekday promotions or incentivizing weekend rides could balance out ride durations and boost engagement.

Bike Type Preferences

Key Observations:

Classic Bike Usage:
- Members have a significantly higher preference for classic bikes, with more than 30,000 rides.
- Casual riders also use classic bikes frequently, but their total rides are lower, just above 20,000.
Electric Bike Usage:
- Electric bike rides are fewer compared to classic bikes for both user groups.
- Members still dominate in electric bike usage, with around 15,000 rides.
- Casual riders have the least usage of electric bikes, with fewer than 10,000 rides.

Insights:

Preference for Classic Bikes:
- Both groups, especially members, have a strong preference for classic bikes over electric bikes, suggesting that the classic option might be seen as more cost-effective or suitable for regular commuting.
- The higher use of classic bikes by members could also indicate a tendency to stick with familiar and possibly less expensive ride options.
Growth Opportunity for Electric Bikes:
- The relatively lower numbers of electric bike rides suggest an opportunity to promote their use, especially among casual riders.
- Offering incentives like discounts on electric bike rentals or promotions highlighting the convenience and speed of electric bikes could increase adoption.
Member Engagement:
- Members consistently show higher usage across both bike types, reflecting their deeper engagement with the service.
- It may be beneficial to leverage this engagement by introducing loyalty programs or rewards for frequent rides to further boost member satisfaction and retention.
Casual Rider Focus on Classic Bikes:
- Casual riders are more inclined towards classic bikes, which might be due to a preference for leisure rides where speed is less of a concern.
- Targeting casual riders with electric bike trial offers could encourage them to explore the faster option, potentially increasing electric bike usage among this group.

Total Rides by the Day of the Week

Key Observations:

Rides by Members:
- Members generally have a higher number of rides compared to casual riders on most weekdays.
- The peak for member rides is on Wednesday, with nearly 10,000 rides.
- Member ride counts are lower on weekends, especially on Sunday.
Rides by Casual Riders:
- Casual riders have a relatively stable number of rides across the week, but their numbers increase notably on Friday and Saturday.
- The highest number of rides for casual riders occurs on Saturday, where they nearly match the ride counts of members.
- Casual rides are lowest on Monday and Tuesday.
Comparison of Rides by Day:
- Weekdays (Monday to Friday) see higher engagement from members, likely due to regular commuting patterns.
- Weekends (especially Saturday) show a shift, with casual riders becoming more active and narrowing the gap with members.
- On Saturday, the number of rides from casual riders surpasses that of members, indicating a preference for weekend leisure activities.

Insights:

Weekday vs. Weekend Dynamics:
- Members seem to use the service more for routine purposes during the weekdays, such as commuting.
- Casual riders show a stronger preference for using the service on weekends, likely for recreational activities or leisurely outings.
Opportunity for Weekend Promotions:
- With a surge in casual rider activity on weekends, there is potential to introduce weekend-specific promotions or packages to further capitalize on this trend.
- For members, offering incentives like weekday ride challenges or rewards for increased weekday usage could help sustain higher engagement levels.
Service Adaptation for Peak Days:
- Understanding the peak days for each rider group allows better allocation of resources, such as ensuring more bike availability on Wednesdays for members and Saturdays for casual riders.
- This can improve customer satisfaction and reduce instances of bike shortages during high-demand periods.

Total Number of Rides by User Type and Day of the Week

Key Observations:

Highest Ride Counts on Weekends:
- Saturday and Sunday see the highest total number of rides, with over 50,000 rides each day.
- On weekends, casual users contribute significantly more rides than members, particularly on Saturday.
Weekday Ride Patterns:
- Monday through Friday have a more balanced and consistent number of rides, with members making up the majority of the rides on these days.
- The overall number of rides is slightly lower midweek (Wednesday and Thursday), compared to Monday and Tuesday.
Member vs. Casual Split:
- Members ride consistently throughout the week, with little fluctuation in numbers.
- Casual users show a significant spike in ridership on weekends, particularly on Saturday, but their numbers are much lower during weekdays.
Friday's Mixed User Base:
- Friday shows a relatively even split between casual and member rides, indicating a possible shift in behavior as the weekend approaches.

Insights:

Weekend Leisure Activity:
- The spike in casual users during the weekends suggests that this user group likely uses the service for leisure or recreational purposes.
Commuting Patterns for Members:
- Members tend to ride consistently throughout the week, indicating that they may use the service for regular commuting, work, or errands.
Marketing Opportunities for Casual Users:
- The high weekend activity of casual users presents an opportunity for targeted promotions or services tailored to leisure rides, possibly encouraging casual users to ride more frequently during the weekdays.
Steady Member Engagement:
- The consistent use by members suggests a loyal base that relies on the service regularly, and this could be leveraged for long-term retention strategies, such as loyalty programs or exclusive benefits during weekends.

Start and end station use - Tableau

In the Tableau Dashboard I created, which is again available here, there is a worksheet that allows the exploration of ride patterns by start and end station, broken down by members, casual riders, and overall combined data. The snapshot below highlights the overall view. When interacting with the dashboard, we observe that casual riders tend to use a smaller number of stations more frequently, resulting in larger, more concentrated ride counts at a few key locations. In contrast, annual members demonstrate a more evenly distributed ride pattern across many stations, with a wider variety of station usage reflected in the map's color range. This suggests that casual rides are concentrated in popular locations, while member rides are more spread out across the city, showing a broader, more consistent use of the system.

Summary Analysis

Cyclistic, a bike-share company in Chicago, has set its sights on expanding its base of annual members. The company’s finance analysts have found that annual members are significantly more profitable than casual riders, which is why Cyclistic’s marketing team, led by Lily Moreno, is focusing on converting casual riders into members rather than acquiring entirely new customers. Casual riders are already familiar with Cyclist’s services, making them prime candidates for this shift.The analysis of user behaviour across several key metrics—such as ride duration, bike preference, and usage by day of the week—reveals distinct patterns between casual riders and annual members. Casual riders typically take longer rides(about 27.5 minutes), most frequently during weekends, whereas members take shorter rides (around 15 minutes), with consistent usage throughout the workweek. Furthermore, members show a clear preference for classic bikes, while electric bikes have lower overall adoption, especially among casual riders. These insights provide several opportunities for tailored marketing strategies to convert casual riders into annual members.

Key Marketing Suggestions to Convert Casual Riders to Members

1. Weekend-Centric Promotions

Since casual riders are most active during the weekends, with the highest ridership on Saturdays, Cyclists can leverage this by creating weekend membership offers. Casual riders who frequently use the service for leisure rides could be enticed with weekend-only discounts or special weekend rates for members.
Additionally, offering extended weekend trial memberships that allow casual riders to experience the benefits of being a member might encourage them to see the value of becoming full-time members. This could be particularly effective if coupled with promotions for longer trips, as casual riders are already inclined to take longer rides.

2. Incentives for Longer Rides

Given that casual riders tend to have longer ride durations, a tiered pricing structure could offer better rates for longer trips but only for members. By framing annual membership as a cost-effective option for longer rides, casual users may feel incentivized to switch. For example, casual riders could receive a message after their ride showing how much they would have saved if they had been members.
Another option could be creating membership add-ons for casual users, where after a certain number of longer trips, casual riders can unlock discounted rates on an annual membership.

3. Mid-Week Engagement Campaigns

Casual riders show significantly less activity during the workweek, whereas members have consistent ride patterns. To shift casual riders’ behavior and encourage them to use the service during the week, Cyclistic could introduce mid-week incentives.
Special offers on weekdays, such as “Ride More, Save More” promotions or reward programs for casual riders who complete a certain number of weekday rides, could incentivize more frequent usage. This approach could familiarize casual riders with the regularity of the service, similar to how members use it for commuting.

4. Electric Bike Promotions

Both casual and member riders currently underutilize electric bikes, but promoting their speed and convenience could be a game-changer, especially for casual riders who take longer rides.
Cyclistic could run electric bike trials for casual riders or offer reduced pricing on electric bike rides exclusively for members. By doing this, casual riders would be motivated to experience the added convenience, and if they find electric bikes appealing, it could nudge them toward membership, where they receive exclusive electric bike benefits.

5. Loyalty and Referral Programs

To retain casual riders and encourage them to commit to an annual membership, Cyclistic could implement a loyalty program where casual riders accumulate points for every ride. These points could then be redeemed for membership discounts or free membership trials.
A referral program for members could also encourage existing members to bring in casual riders, potentially converting them through word-of-mouth recommendations and offering both parties rewards for the successful conversion.

6. Targeted Digital Marketing Campaigns

Cyclists should invest in digital marketing that specifically targets casual riders using data from their ride history. Casual riders who tend to take frequent or longer rides could receive personalized ads or emails showcasing how membership would save them money and offer additional perks, such as exclusive discounts, priority access to bikes, or free ride credits.
Leveraging social media platforms like Instagram or Facebook, where casual riders may already be active, can enhance brand visibility and engagement. Campaigns focusing on the lifestyle benefits of membership, such as improved accessibility for regular rides and fun weekend outings, would resonate well with the casual audience.

Conclusion

Cyclistic’s focus on converting casual riders into annual members is a viable strategy based on the data. Casual riders are already engaged with the service, particularly during weekends, and tend to take longer, leisurely rides. By emphasizing the cost-effectiveness of membership for these longer rides, offering targeted promotions on weekends and weekday engagement incentives, and promoting the benefits of electric bikes, Cyclistic can effectively convert a large portion of its casual rider base into more profitable annual members.To achieve this, Cyclistic should implement a multi-faceted marketing strategy combining pricing incentives, targeted digital campaigns, and customer loyalty programs that capitalize on the distinct usage patterns of casual riders. This approach will not only increase membership but also strengthen Cyclistic’s presence as the go-to bike-share service for both commuters and leisure riders alike.

5. Share

Stakeholder presentation and dashboard

I’ve provided links below for my dashboard and shareholder presentation, which includes the following:

A summary of my analysis
Supporting visualizations and key findings
Three recommendations based on my analysis

Presentation: How Does a Bike-Share Navigate Speedy Success? - Nithika PidikitiTableau Dashboard: Cyclist Bike-Share in Chicago - Nithika Pidikiti

Data Analytics Project

Case Study 2

Enterprise Project

Home

Page updated

Google Sites

Report abuse