The project I chose to complete my Data Analytics Professional certificate is a case study on a fictional company (Cyclistic), a bike-share program based in Chicago. which had around 5,800 bikes and 600 dock stations. The company is planning to maximize the number of Cyclistic members by creating marketing strategies to convert casual riders to annual member riders, where I was tasked to make data-driven recommendations for the marketing campaign. I completed this analysis using the six phases of the data analysis process: ask, prepare, process, analyze, share and act.
Introduction
The project I chose to complete my Data Analytics Professional certificate is a case study on a fictional company (Cyclistic), a bike-share program based in Chicago with around 5,800 bikes and 600 dock stations. The company is planning to maximize the number of Cyclistic members by creating marketing strategies to convert casual riders to annual member riders, where I was tasked to make data-driven recommendations for the marketing campaign. I will complete this analysis using the six phases of the data analysis process: ask, prepare, process, analyze, share and act.
Even though the entire scenario is fictitious, but the data used for this project are real data collected between Dec 2020 – Nov 2021 from a bike share program in Chicago. In this project I am assuming the role of the junior analyst. The team would like to know how annual members and casual riders differ, why casual riders would buy a membership, and how Cyclistic can use digital media to influence casual riders to become members. The team is interested in analyzing the Cyclistic historical bike trip data to identify trends in the usage of bikes by casual and member riders.
I. ASK
Business Objective
To increase profitability by converting casual riders to annual members via a targeted marketing campaign.
Business Task for Junior Analyst
The junior analyst has been tasked with answering this question: How do annual members and casual riders use Cyclistic bikes differently?
II. PREPARE
Where is Data Located?
The data used for this analysis were obtained from a company employed by the City of Chicago to collect data on bike share usage.
How is the Data Organized?
The data is organized in monthly csv files. The most recent twelve months of data (Dec, 2020 – Nov 2021) were used for this project. The files consist of 13 columns containing information related to ride id, ridership type, ride time, start location and end location and geographic coordinates, etc.
Credibility of the Data
The data is collected directly the company that runs the Cyclistic Bike Share program for the City of Chicago. The data is comprehensive in that it consists of data for all the rides taken on the system and is not just a sample of the data. The data is current. It is released monthly and, as of Jan 2022, was current to Nov 2021. The City of Chicago makes the data available to the public.
Licensing, privacy, security, and accessibility
This data is anonymized as it has been stripped of all identifying information. This ensures privacy, but it limits the extent of the analysis possible. There is not enough data to determine if casual riders are repeat riders or if casual riders are residents of the Chicago area. The data is released under this license.
Ability of Data to be used to answer Business Question
One of the fields in the data records the type of rider; casual riders pay for individual or daily rides and member riders purchase annual subscription. This information is crucial to be able to determine differences between how the two groups use the bike share program.
Problems with the data
There are some problems with the data. Most of the problems (duplicate records, missing fields, etc.) can be dealt with by data cleaning, but a couple of issues require further explanation.
III. PROCESS & CLEAN
What tools are you choosing and why?
For this project I choose to use RStudio Desktop to analyze and clean the data and Tableau to create the visualizations. The data set was too large to be processed in spreadsheets.
Review of Data
Data was reviewed to get an overall understanding of content of fields, data formats, and to ensure its integrity. The review of the data involved, checking column names across the 12 original files and checking for missing values, trailing white spaces, duplicate records, and other data anomalies.
The review of the data revealed several problems:
Duplicate record ID numbers
Records with missing start or end stations
Records with very short or very long ride durations
Records for trips starting or ending at an administrative station (repair or testing station)
Once the initial review was completed, all twelve files were loaded into one data frame. The resulting amalgamated file consisted of 4.731,081 rows with 13 columns of character and numeric data. This matched the number of records in the twelve monthly files.
IV. ANALYZE
Once the data was cleaned, analysis of the data was undertaken in RStudio to determine the following:
Mean, median, maximum and minimum ride duration (by rider type)
Average ride length by day and by rider type
Count of trips by rider type
Count of trips by bicycle type
Count of the number of start stations
The cleaned data set was used to create a csv file that was exported from RStudio and imported into Tableau for further analysis and creation of visualizations.
Tableau was used to further analyze the data and determine:
Ride duration
Times of Day for rides
Days of the week for rides
Months of the year of the rides
V. SHARE
Detailed documentation of R code is available on Kaggle and further, interactive visualizations are available on Tableau Public.
VI. ACT
Top Three Recommendations
Based on an analysis of the data, the following recommendations can be made to the Cyclistic stakeholders:
On weekends, Cyclistic shall run a promo/discount program with a certain % for their next week days ride if they ride over 12 mins on weekends.
Run a discount program for member riders during peak months from July - September
Run a discount program for casual riders on weekdays so they can increase their number of causal riders on weekdays too.