Cyclistic bike-share analysis case study
Cyclistic is a bike sharing program which features more than 5,800 bikes and 600 docking stations. It offers reclining bikes, hand tricycles, and cargo bikes, making it more inclusive to people with disabilities and riders who can't use a standard two-wheeled bike. It was founded in 2016 and has grown tremendously into a fleet of bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Marketing team is interested in analyzing the Cyclistic historical bike trip data to identify trends.
You are a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members.
For data analysis three different tools will be used namely MS Excel for analyzing data separately for each month,R studio for analysing combined data as a whole year and finally tableau for visualization of the data.
Initially,started gathering and analyzing data using microsoft excel to get a general understanding.
Began downloading the data from divvy-tripdata, and turning the .csv files into excel spreadsheets.I downloaded the most recent year of data which was available at the time of starting my project. I choose the data from january 2023-december 2023 for my analysis.
Added two columns to all of the months:
ride_length calculated the total ride length for each trip using the start_at column which was: ending time minus starting time.
day_of_week calculated the day of the week for each trip using the start_at column date.
Went over the business task and the information I had at hand and how that could be used to figure out how members and casual riders use the bike service differently.
Came up with metrics to look at such as :
total number of rides per hour, per day of the month, per season, per day of the week, and for different bike types.
Average ride length between members and casual.
For every month in Excel created pivot tables and charts to go with the analysis on (this took the longest):
Total Rides per Weekday - calculated the total rides for members and casual and separated it by day of the week; used a cluster column chart.
Average Ride Length - calculated the average ride length for members and casual and separated it by day of the week; used a cluster column chart.
Total Rides per Hour - calculated the total rides for members and casual separated by the time of the day (24hr); used a line comparison chart.
Total Rides per Day - calculated the total rides for members and casual separated by the day of the month; used a line comparison chart.
Total Rides per Bike Type - calculated the total rides for members and casual separated by Bike type; used stacked column chart.
I used R to analyze the data because it could handle all of the information quicker than Excel. Below is my general process in R.
View my full code on my Github for this capstone project here.:-
Load all of the libraries I used: tidyverse, lubridate, hms, data.table
Uploaded all of the original data from the data source divytrip into R using read_csv function to upload all individual csv files and save them in separate data frames. For august 2023 data I saved it into aug08_df, september 2023 to sep09_df and so on.
Merged the 12 months of data together using rbind to create a one year view
Created a new data frame called cyclistic_date that would contain all of my new columns
Created new columns for:
Ride Length - did this by subtracting end_at time from start_at time
Day of the Week
Month
Day
Year
Time - convert the time to HH:MM:SS format
Hour
Season - Spring, Summer, Winter or Fall
Time of Day - Night, Morning, Afternoon or Evening
Cleaned the data by:
Removing duplicate rows
Remove rows with NA values (blank rows)
Remove where ride_length is 0 or negative (ride_length should be a positive number)
Remove unnecessary columns: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng
Calculated Total Rides for:
Total number of rides which was just the row count = 5707197
Member type - casual riders vs. annual members
Type of Bike - classic vs docked vs electric; separated by member type and total rides for each bike type
Hour - separated by member type and total rides for each hour in a day
Time of Day - separated by member type and total rides for each time of day (morning, afternoon, evening, night)
Day of the Week - separated by member type and total rides for each day of the week
Day of the Month - separated by member type and total rides for each day of the month
Month - separated by member type and total rides for each month
Season - separated by member type and total rides for each season (spring, summer, fall, winter)
Calculated Average Ride Length for:
Total average ride length
Member type - casual riders vs. annual members
Type of Bike - separated by member type and average ride length for each bike type
Hour - separated by member type and average ride length for each hour in a day
Time of Day - separated by member type and average ride length for each time of day (morning, afternoon, evening, night)
Day of the Week - separated by member type and average ride length for each day of the week
Day of the Month - separated by member type and average ride length for each day of the month
Month - separated by member type and average ride length for each month
Season - separated by member type and average ride lengths for each season (spring, summer, fall, winter)
Then using all of this data I created my own summary in my case notes and took note of the: total rides for each variable, average ride lengths for each variable, and the difference between members versus casual riders.
While I learned the basics of Tableau in the Google Course I wanted more practice with visualizing data and creating dashboards.
To view my completed dashboard click here.
I created a separate R code (you can view it here on Github) that made some changes for specifically the Tableau portion.
For ride length I rounded the digits by 1, meaning my numbers were 29.8 or 12.5.
Revised how I created my "month" column. I used mutate() to create a column that had the month in ___ format and not number format. So instead of 01 it would say "January"
Cleaned the data: removed rows with NA values, removed duplicate rows, removed where ride_length was 0 or negative and removed unnecessary columns like: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng
Created a new dataframe with this information so I could test the difference between the original data frame (cyclistic_date) that I used for my analysis and the data frame I would use for Tableau (cyclistic_tableau).
In this new data frame I removed more columns to make calculations quicker in Tableau. I removed: start_station_name, end_station_name, time, started_at, ended_at
Downloaded this data frame into a .csv file which I uploaded to Tableau
Created graphs similar to those I created in Excel but added a few:
User Type (Figure 1)
Total Rides by Bike Type (Figure 3)
Ride Length by Weekday (Figure 2)
Total Rides by Weekday (Figure 4)
Total Rides by Hour (Figure 5)
Total Rides by Month (Figure 6)
Made minor edits to design elements and created final dashboard (Figure 7)
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 7
While I learned the basics of Tableau in the Google Course I wanted more practice with visualizing data and creating dashboards.
Total Rides recorded for the whole year of 2023 = 5.71 Million.
Members have used the service more than casual users.members contributed about 64% of the total rides and casual users contributed for the remaining 36%.
Average ride length recorded was 18.23 mins(including members and casual users).Based on user type casual riders have registered the longest ride of 28.30 minutes as compare to 12.56 minutes by member users.
Busiest weekday was saturday.Count of member users(471,659) was more than the casual users(409,904).
The busiest time of the day recorded was afternoon with a count of 2,512,547 rides.
August was registered as the busiest month with a total of 769,961 rides shared between member and casual riders.
Summer was the busiest season of the year with 2,253,883 rides or 39.49 % of total rides.With member riders accounted for 22.98 % of the total ride and casual riders accounted for 16.51% of the total rides for the summer.
Top three recommendations :-
Provide customized discounts and perks for members depending upon their profile like most travelled route,during what time of the day they use the service the most,average ridelength etc.
Encourage casual riders to signup to the service by providing discounted monthly/yearly subscriptions or providing addon services.
Incentive riders who travel during less busy hours or travel on less popular routes so that more riders opt for our service.
Start referral programs for existing users so if someone joins using their referral ,existing user gets ride credits which they can use for their next trip.
Encourage users to share their experience with service across all social media platforms.