You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximising the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclists bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclists executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualisations.
This is my presentation: Presentation: How Does a Bike-Share Navigate Speedy Success? - Nithika Pidikiti
Cyclists: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
Cyclists marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclists marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
Cyclists executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.
In 2016, Cyclists launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic's marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic's finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
Client/Sponsor:
Cyclists Marketing Team
Purpose:
Cyclists is focused on increasing its profitability by boosting the number of annual memberships. To achieve this goal, it is essential to understand how annual members and casual riders differ in their use of Cyclists bikes. By analysing historical bike trip data, the marketing team aims to identify distinct usage patterns between these two groups. This understanding will guide the design of targeted marketing strategies aimed at converting casual riders into annual members. The insights gained from this analysis will be critical for presenting actionable recommendations to the Cyclists executive team, who will make the final decision on whether to implement the proposed marketing strategy. 10The primary problem is to discern how annual members and casual riders differ in their bike usage. Addressing this issue will enable the development of marketing strategies tailored to increase annual memberships. Insights from this analysis will reveal specific behaviours and preferences of each rider type, which can be used to craft targeted marketing campaigns and promotions that resonate with casual riders and encourage them to opt for annual memberships.
Scope / Major Project Activities:
Data Collection - Collect historical bike trip data to understand usage patterns of annual members and casual riders.
Data Analysis - Analyse the data to identify differences in trip frequency, duration, time of day, and location.
Insights and Findings - Extract key insights regarding the behaviours and preferences of each rider type.
Recommendation - Develop targeted marketing strategies based on the analysis to convert casual riders to annual members.
Final Report - Compile a comprehensive report detailing the differences in bike usage, supported by data visualisations and actionable recommendations.
This project does not include:
Implementing any marketing strategies or campaigns.
Analysis of bike usage data beyond the historical scope provided.
Deliverables:
Usage Pattern Analysis - Detailed analysis of how annual members and casual riders use Cyclistic bikes differently.
Insights Report - Key findings regarding the behaviours and preferences of each rider type.
Marketing Recommendations - Targeted strategies to convert casual riders into annual members based on the data.
Final Report - Comprehensive report including data-driven insights and visualisations for executive review.
Estimated date of Completion:
October 10, 2024
Download the Cyclistic trip data here. I have used data from the period of July 9th, 2024 (03:17:18AM) to Sep 5th, 2024 (10:20:26 AM) since this data is the latest data available (this project is being done in 2024). The data is organized in a CSV file, structured into 13 columns representing each month. This dataset, provided by Motivate International Inc. under this license. The dataset contains no personal information about the riders, ensuring privacy and compliance with data protection standards.
To verify the data’s integrity, all columns were checked for consistency and proper data types. The dataset is highly suitable for the case study as it aligns with the business questions being addressed. However, there are some limitations, particularly with missing information regarding station names and station IDs. Despite this, the dataset remains adequate for analysis and supports the objectives of this case study. There are 3 CSV files in total.
It is structured data, organised in rows (records) and columns (fields). Each record represents one trip, and each trip has a unique field that identifies it: ride_id. Each trip is anonymised and includes the following fields:
* ride_id #Ride id - unique
* rideable_type #Bike type - Classic, Docked, Electric
* started_at #Trip start day and time
* ended_at #Trip end day and time
* start_station_name #Trip start station
* start_station_id #Trip start station id
* end_station_name #Trip end station
* end_station_id #Trip end station id
* start_lat #Trip start latitude
* start_lng #Trip start longitute
* end_lat #Trip end latitude
* end_lat #Trip end longitude
* member_casual #Rider type - Member or Casual
For this project, I utilised Excel for initial data manipulation and R for more complex statistical analyses and creating visualisations and Tableau for visualisations too. R is particularly effective for conducting in-depth analysis, generating insights, and producing dynamic visualisations, while Excel facilitated quick visualisations and preliminary exploration of the data. To maintain data integrity, I conducted thorough checks for consistency and correctness across the dataset.
The following steps were taken to ensure the data is clean and ready for analysis:
Data validation was performed to identify potential errors.
Null data was highlighted using conditional formatting for further investigation.
Mistyped words and numbers were corrected through systematic checks.
Extra spaces and characters were removed using the trim function.
Duplicate values were handled with the distinct function.
Mismatched data types were corrected to align with the required formats.
Inconsistent strings were standardised for consistency.
Date formats were unified across the dataset.
Misleading variable labels were clarified to ensure accurate interpretation.
Truncated data and other inconsistencies were resolved.
I have documented the entire process (including cleaning) within R in the analyse section, enabling a transparent review and the ability to share these results when needed. This thorough documentation allows for easy tracking and verification of all the steps taken to prepare the data for analysis.
Our next step is to ensure the data is stored appropriately and prepared for analysis. To achieve this, I downloaded all 3 zip files, unzipped them, and created a temporary folder on my desktop to house the files. I then organised the files into subfolders for .CSV files and .XLS files, maintaining a copy of the original data. Next, I launched Excel and opened each file, saving each one as an Excel Workbook file (.xlsx) to ensure compatibility and ease of analysis. For each of the 3 .XLS file, I performed the following operations:
Changed format of started_at and ended_at columns
Formatted as custom DATETIME
Format > Cells > Custom > dd/mm/yyyy h:mm:ss
Created a column called ride_length
Calculated the length of each ride by subtracting the column started_at from the column ended_at - using the formula = [@[ended_at]] - [@[started_at]]
Formatted as TIME
Format > Cells > Time > HH:MM:SS (13:30:55)
Created a column called day_of_week
Calculated the day of the week that each ride started using the WEEKDAY command (example: =WEEKDAY(D2,1))
Formatted as a NUMBER with no decimals
Format > Cells > Number (no decimals) > 1,2,3,4,5,6,7
Note: 1 = Sunday and 7 = Saturday
FINISHED EXCEL SHEETS SMALL PREVIEW
# Install required packages
install.packages("tidyverse")
install.packages("readxl")
install.packages("lubridate")
# Load the libraries
library(tidyverse)
library(readxl)
library(lubridate)
# Load the Excel datasets
data_august <- read_excel("~/Desktop/CASE STUDY 1 /3/untitled folder/AUGUST DATA - NEETHU.xlsx")
data_july <- read_excel("~/Desktop/CASE STUDY 1 /3/JUY/Neethuv1 JULY(AutoRecovered).xlsx")
data_september <- read_excel("~/Desktop/CASE STUDY 1 /3/SEP/Sep 2024 neethu data.xlsx")
# Verify that data has been imported correctly
print(head(data_august)) # Check first few rows of month 1 data
print(head(data_july)) # Check first few rows of month 2 data
print(head(data_september)) # Check first few rows of month 3 data
Inspect Column Names and Data Structure
# Check column names for each file
colnames(data_august)
colnames(data_july)
colnames(data_september)
# Inspect the structure of each file
str(data_august)
str(data_july)
str(data_september)
Rename Columns for Consistency (if necessary)
# Standardize column names across all datasets
# For August data, no renaming needed except for "Ride_length" and "day_of_week"
data_august <- rename(data_august,
ride_length = "Ride_length", # Ensure consistent lowercase
weekday = "day_of_week") # Match with September's Weekday
# For July data, renaming "Source.Name", "Ride_length", and "day_of_week"
data_july <- rename(data_july,
ride_length = "Ride_length", # Ensure consistent lowercase
weekday = "day_of_week", # Match with September's Weekday
source_name = "Source.Name") # Renaming Source.Name
# For September data, rename "Weekday" to match other files
data_september <- rename(data_september,
day_of_week = "Weekday") # Match "Weekday" to "day_of_week"
#verification of changes
colnames(data_august)
colnames(data_july)
colnames(data_september)
More Conversions
library(dplyr)
library(lubridate) # For date-time conversion
Convert data time columns:
# Convert date-time columns in data_august
data_august <- data_august %>%
mutate(
started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"),
ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC")
)
# Convert date-time columns in data_july
data_july <- data_july %>%
mutate(
started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"),
ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC")
)
# Convert date-time columns in data_september
data_september <- data_september %>%
mutate(
started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"),
ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC")
)
Convert end_station_id Columns: Convert the end_station_id column in each dataset to character type
# Convert end_station_id to character in data_august
data_august <- data_august %>%
mutate(end_station_id = as.character(end_station_id))
# Convert end_station_id to character in data_july
data_july <- data_july %>%
mutate(end_station_id = as.character(end_station_id))
# Convert end_station_id to character in data_september
data_september <- data_september %>%
mutate(end_station_id = as.character(end_station_id))
Convert start_lat Columns: Convert the start_lat column in each dataset to double type for consistency
# Convert start_lat to double in data_august
data_august <- data_august %>%
mutate(start_lat = as.double(start_lat))
# Convert start_lat to double in data_july
data_july <- data_july %>%
mutate(start_lat = as.double(start_lat))
# Convert start_lat to double in data_september
data_september <- data_september %>%
mutate(start_lat = as.double(start_lat))
# Convert start_lng to double in all datasets
data_august <- data_august %>%
mutate(start_lng = as.double(start_lng))
data_july <- data_july %>%
mutate(start_lng = as.double(start_lng))
data_september <- data_september %>%
mutate(start_lng = as.double(start_lng))
# Convert end_lat to double in all datasets
data_august <- data_august %>%
mutate(end_lat = as.double(end_lat))
data_july <- data_july %>%
mutate(end_lat = as.double(end_lat))
data_september <- data_september %>%
mutate(end_lat = as.double(end_lat))
# Convert end_lng to double in all datasets
data_august <- data_august %>%
mutate(end_lng = as.double(end_lng))
data_july <- data_july %>%
mutate(end_lng = as.double(end_lng))
data_september <- data_september %>%
mutate(end_lng = as.double(end_lng))
# Combine the three datasets into one dataframe
all_trips <- bind_rows(data_august, data_july, data_september)
# Check the structure of the combined dataframe
str(all_trips)
# Optionally, view the first few rows
head(all_trips)
Remove unnecessary columns
# Remove unnecessary columns
all_trips <- all_trips %>%
select(-c(start_lat, start_lng, end_lat, end_lng))
Handle missing or incorrect Data
# Remove rows with missing start/end times
all_trips <- all_trips %>%
filter(!is.na(started_at) & !is.na(ended_at)) # Ensure there are no missing start/end times
# Convert ride_id to character (if it's not already)
all_trips <- all_trips %>%
mutate(ride_id = as.character(ride_id))
# Standardize "member_casual" column values
all_trips <- all_trips %>%
mutate(member_casual = recode(member_casual, "Subscriber" = "member", "Customer" = "casual"))
Add Date-Related Columns
# Ensure 'started_at' is in POSIXct format
all_trips$started_at <- as.POSIXct(all_trips$started_at)
# Add new columns for date, month, day, year, and day of the week
all_trips$date <- as.Date(all_trips$started_at) # Extract date
all_trips$month <- format(all_trips$date, "%m") # Extract month
all_trips$day <- format(all_trips$date, "%d") # Extract day
all_trips$year <- format(all_trips$date, "%Y") # Extract year
all_trips$day_of_week <- format(all_trips$date, "%A") # Extract day of the week
# Calculate the ride length in seconds
all_trips$ride_length <- difftime(all_trips$ended_at, all_trips$started_at, units = "secs")
# Convert ride_length to numeric (if necessary)
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
# Remove rides with negative durations
all_trips <- all_trips %>% filter(ride_length > 0)
Summary Statistics for Ride Length
# Calculate basic summary statistics for ride length
mean_ride_length <- mean(all_trips$ride_length, na.rm = TRUE)
median_ride_length <- median(all_trips$ride_length, na.rm = TRUE)
max_ride_length <- max(all_trips$ride_length, na.rm = TRUE)
min_ride_length <- min(all_trips$ride_length, na.rm = TRUE)
# Display the summary statistics
mean_ride_length
median_ride_length
max_ride_length
min_ride_length
Grouped Analysis by User Type
# Summary statistics by user type (member vs casual)
user_summary <- all_trips %>%
group_by(member_casual) %>%
summarise(
mean_ride_length = mean(ride_length, na.rm = TRUE),
median_ride_length = median(ride_length, na.rm = TRUE),
max_ride_length = max(ride_length, na.rm = TRUE),
min_ride_length = min(ride_length, na.rm = TRUE)
)
# Display user_summary
print(user_summary)
install.packages("munsell")
install.packages("ggplot2")
packageVersion("ggplot2")
install.packages("farver")
Average Ride Length by User Type
# Plot the average ride length by user type
ggplot(all_trips, aes(x = member_casual, y = ride_length, fill = member_casual)) +
geom_bar(stat = "summary", fun = "mean") +
labs(title = "Average Ride Length by User Type", x = "User Type", y = "Average Ride Length (secs)")
Number of Rides by Day of the Week
# Plot the number of rides by day of the week
ggplot(all_trips, aes(x = day_of_week, fill = member_casual)) +
geom_bar() +
labs(title = "Number of Rides by Day of the Week", x = "Day of Week", y = "Number of Rides")
Histogram of Ride Lengths
# Example: Create all_trips_v2 from an existing data frame
all_trips_v2 <- all_trips # Replace with your actual data processing steps
# Load required libraries
library(dplyr)
library(ggplot2)
library(lubridate)
# Load your data (adjust the file path and method as needed)
all_trips <- read.csv("path/to/your/data.csv") # Update with your actual file path
# Ensure you process your data if necessary
# For example, if you are filtering or mutating:
all_trips_v2 <- all_trips %>%
mutate(ride_length = as.numeric(difftime(ended_at, started_at, units = "secs")),
day_of_week = weekdays(started_at)) # Create the required columns
# Check if all_trips_v2 is created successfully
print(head(all_trips_v2)) # View the first few rows
# Now run the ggplot commands
ggplot(all_trips_v2, aes(x = ride_length)) +
geom_histogram(binwidth = 60, fill = "blue", color = "black", alpha = 0.7) +
labs(title = "Distribution of Ride Lengths", x = "Ride Length (seconds)", y = "Number of Rides") +
theme_minimal()
Boxplot of Ride Lengths by Member Type
# Check the structure of all_trips_v2
str(all_trips_v2)
# View the first few rows of your data frame
head(all_trips_v2)
# Ensure ride_length is numeric
all_trips_v2$ride_length <- as.numeric(all_trips_v2$ride_length)
# Convert member_casual to a factor if not already
all_trips_v2$member_casual <- as.factor(all_trips_v2$member_casual)
library(ggplot2)
# Create a boxplot for ride_length by member_casual
ggplot(all_trips_v2, aes(x = member_casual, y = ride_length, fill = member_casual)) +
geom_boxplot() +
labs(title = "Ride Length Distribution by User Type", x = "User Type", y = "Ride Length (seconds)") +
theme_minimal()
# Check for NA values
sum(is.na(all_trips_v2$ride_length))
# Optionally remove rows with NA in ride_length
all_trips_v2 <- na.omit(all_trips_v2)
summary(all_trips_v2$ride_length)
Average Ride Duration Over Days of the Week
library(dplyr)
library(ggplot2)
library(lubridate) # Ensure you have lubridate loaded for wday()
# Create the line plot for average ride duration by day of the week
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(average_duration = mean(ride_length), .groups = "drop") %>%
ggplot(aes(x = weekday, y = average_duration, color = member_casual, group = member_casual)) +
geom_line(linewidth = 1) + # Updated from size to linewidth
geom_point(size = 3) +
labs(title = "Average Ride Duration by Day of the Week", x = "Day of Week", y = "Average Duration (seconds)") +
theme_minimal() +
scale_y_continuous(labels = scales::comma)
Ride Count by Weekday and Member Type
# Ride Count by Weekday and Member Type (Stacked Bar Plot)
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()) %>%
ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
geom_bar(stat = "identity") +
labs(title = "Total Number of Rides by User Type and Day of the Week", x = "Day of Week", y = "Number of Rides") +
theme_minimal()
Average Ride Length for Each User Type Over Time
# Average Ride Length Over Time
all_trips_v2 %>%
mutate(date = as.Date(started_at)) %>% # Extract date from datetime
group_by(date, member_casual) %>%
summarise(average_duration = mean(ride_length)) %>%
ggplot(aes(x = date, y = average_duration, color = member_casual)) +
geom_line() +
labs(title = "Average Ride Length Over Time by User Type", x = "Date", y = "Average Ride Length (seconds)") +
theme_minimal()
Annual Members vs Casual Riders (Pie chart)
library(dplyr)
library(ggplot2)
# Summarize data: Count of rides by user type and calculate percentage
percentage_data <- all_trips_v2 %>%
group_by(member_casual) %>%
summarise(number_of_rides = n()) %>%
mutate(percentage = (number_of_rides / sum(number_of_rides)) * 100) # Calculate percentage
# Create pie chart for percentage of rides by user type
ggplot(percentage_data, aes(x = "", y = percentage, fill = member_casual)) +
geom_bar(stat = "identity", width = 1) + # Create the pie chart
coord_polar("y") + # Convert to polar coordinates for pie chart
labs(title = "Annual Members vs Casual Riders",
fill = "User Type",
y = "Percentage of Rides") +
theme_void() + # Use a void theme for a cleaner look
theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold")) + # Center title
geom_text(aes(label = paste0(round(percentage, 1), "%")),
position = position_stack(vjust = 0.5),
color = "white") # Add percentage labels to the pie slices
Quarterly Trip Totals (July, August, September)
# Filter data for Q3 (July, August, September)
q3_data <- all_trips_v2 %>%
filter(month(started_at) %in% c(7, 8, 9))
# Summarize total rides for members vs casual riders in Q3
q3_totals <- q3_data %>%
group_by(member_casual) %>%
summarise(total_rides = n())
# Visualize total rides by user type
ggplot(q3_totals, aes(x = member_casual, y = total_rides, fill = member_casual)) +
geom_bar(stat = "identity") +
labs(title = "Total Rides in Q3 (July, August, September)", x = "User Type", y = "Total Rides") +
theme_minimal() +
scale_y_continuous(labels = scales::comma)
Median Ride Length by User Type
library(dplyr)
library(ggplot2)
library(lubridate) # Make sure to load lubridate for month() function
# Check if all_trips_v2 is loaded
if (exists("all_trips_v2")) {
# Filter data for Q3 (July, August, September)
q3_data <- all_trips_v2 %>%
filter(month(started_at) %in% c(7, 8, 9))
# Check if filtered data is valid
if (nrow(q3_data) > 0) {
# Calculate median ride length for members and casual riders
q3_median_ride_length <- q3_data %>%
group_by(member_casual) %>%
summarise(median_ride_length = median(ride_length, na.rm = TRUE)) # Remove NA values
# Visualize median ride length for members vs casual riders
ggplot(q3_median_ride_length, aes(x = member_casual, y = median_ride_length, fill = member_casual)) +
geom_bar(stat = "identity") +
labs(title = "Median Cyclist Bike Ride Length", x = "User Type", y = "Median Ride Length (seconds)") +
theme_minimal() +
scale_y_continuous(labels = scales::comma) # Format y-axis with commas
} else {
print("Filtered data for Q3 is empty.")
}
} else {
print("Dataset 'all_trips_v2' is not loaded.")
}
Ride Duration by Day of the Week
# Add a weekday column
q3_data <- q3_data %>%
mutate(weekday = wday(started_at, label = TRUE))
# Summarize median ride length by day of week and user type
q3_ride_duration_by_day <- q3_data %>%
group_by(member_casual, weekday) %>%
summarise(median_ride_length = median(ride_length, na.rm = TRUE), .groups = 'drop') # Add .groups argument to avoid warning
# Visualize median ride length by day of the week
ggplot(q3_ride_duration_by_day, aes(x = weekday, y = median_ride_length, color = member_casual, group = member_casual)) +
geom_line(linewidth = 1) +
geom_point(size = 3) +
labs(title = "Median Ride Length by Day of Week", x = "Day of Week", y = "Median Ride Length (seconds)") + # Updated title
theme_minimal() +
scale_y_continuous(labels = scales::comma)
Bike Type Preferences
# Summarize bike type preferences by user type
q3_bike_type_pref <- q3_data %>%
group_by(member_casual, rideable_type) %>%
summarise(total_rides = n(), .groups = 'drop') # Added .groups argument to avoid warning
# Visualize bike type preferences by user type
ggplot(q3_bike_type_pref, aes(x = rideable_type, y = total_rides, fill = member_casual)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Bike Type Preferences", x = "Bike Type", y = "Total Rides") + # Updated title
theme_minimal() +
scale_y_continuous(labels = scales::comma)
Total Rides by Day of the Week (Casual Riders vs Annual Members)
# Summarize total rides by day of week and user type
q3_rides_by_day <- q3_data %>%
group_by(member_casual, weekday) %>%
summarise(total_rides = n(), .groups = 'drop') # Added .groups argument to avoid warning
# Visualize total rides by day of the week
ggplot(q3_rides_by_day, aes(x = weekday, y = total_rides, fill = member_casual)) +
geom_col(position = "dodge") +
labs(title = "Total Rides by the Day of the Week", x = "Day of Week", y = "Total Rides") + # Updated title
theme_minimal() +
scale_y_continuous(labels = scales::comma)
Small snippets of my code in R
After cleaning and creating visualisations with the data, the next step was to analyse the distinct usage patterns of Cyclists bikes between annual members and casual riders using R. This analysis included various visualisations and statistical assessments to understand user behaviour comprehensively. Key metrics examined were summary statistics for ride length, grouped analysis by user type, and average ride length comparisons. Additionally, I explored the number of rides by day of the week, along with histograms and box-plots of ride lengths categorised by member type. To further delve into usage trends, I analysed average ride duration across the week, ride counts segmented by weekday and user type, and average ride lengths over time. Visual representations such as a pie chart comparing annual members to casual riders, quarterly trip totals for July, August, and September, median ride lengths by user type, ride duration by day of the week, bike type preferences, and total rides by day of the week for both user types were also included. Additionally I have a visualisation made on Tableau Public I created: https://public.tableau.com/shared/6DSN67Y6T?:display_count=n&:origin=viz_share_link
My insights and visualisations are shown below:
During these 3 months, members accounted for 56.5% of Cyclists total trips while casual riders accounted for 43.5% of total trips. However the percentage fluctutates through these 3 months.
In the Tableau Dashboard I created, which is again available here, there is a worksheet that allows the exploration of ride patterns by start and end station, broken down by members, casual riders, and overall combined data. The snapshot below highlights the overall view. When interacting with the dashboard, we observe that casual riders tend to use a smaller number of stations more frequently, resulting in larger, more concentrated ride counts at a few key locations. In contrast, annual members demonstrate a more evenly distributed ride pattern across many stations, with a wider variety of station usage reflected in the map's color range. This suggests that casual rides are concentrated in popular locations, while member rides are more spread out across the city, showing a broader, more consistent use of the system.