This is the final project for the Google Data Analytics Professional Certificate course offered by Coursera. It involves a case study aimed at solving a business problem: Using data analysis to provide suggestions for creating a marketing strategy to maximize the number of annual memberships for a fictional company named Cyclistic. Cyclistic offers bicycle rental services in the state of Chicago, United States. To achieve this, we aim to understand how annual members and casual riders utilize Cyclistic bikes differently, following the data analysis process: ask, prepare, process, analyze, share, and act.
Keywords: Cyclistic bike-share, marketing, membership, Chicago.
Tools: Excel, LaTeX, Python, Jupyther Notebook, DataCamp Workspace, ChatGPT, HTML.
Problem description
The company Cyclistic, is a bike-share enterprise that has offered their services since 2016 in Chicago, United States. The bike-share program is composed of 5,800 bicycles and 600 docking stations. The organization have two type of customers, group by the pricing plans that are available in the company:
Casual riders: Customers who use single-ride passes or full-day pasess.
Cyclistic members: Customers who purchase annual membership.
The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. The cyclistic marketing analytics team wants to understand how casual riders and annual members use Cyclistic bikes differently.
Target population
Key stakeholders:
Lily Moreno, the director of marketing and the manager.
Cyclistic marketing analytics team.
Cyclistic executive team.
Population to analyze: Historical cyclistic trip data of Chicago citizens using the Cyclistic bike-share system.
Selected sample: Cyclistic trip data which runs from July 2022 to June 2023.
Business Goal
Make a marketing strategy for converting casual riders into annual members and so on maximizing the number of annual memberships.
Business task
Determine how annual members and casual riders use Cyclistic bikes differently.
Analyzing User Behavior Regarding Time.
Analyzing User Behavior Regarding Bicycle Type.
Analyzing User Behavior Regarding Location.
Problem statement
How do annual members and casual riders use Cyclistic bikes differently?
We will explore the following:
Users Behavior Regarding time.
What is the average trip duration?
At what times of the day are trips most frequent?
Which days of the week have the most trips?
What is the travel behavior like in each season of the year?
Users behavior regarding the bicycle type: What do they prefer: Classic, electric, or docked bicycle?
Users behavior regarding the location: What kind of places do they frequent?
Methodology
The cloud-based programming environment called "Workspace" from DataCamp will be used to create a Python notebook for data analysis. DataCamp is an online educational platform focused on data science. "Workspace" provides access to powerful cloud-based hardware, is similar to Jupyter Notebook, and features OpenAI's AI for data science tasks. This is necessary because the data to be analyzed is large.
Two data sources will be used. First, Trip data, which contains information about the trips taken, including the type of bicycle, the start and end times of the trip, the departure and arrival stations, including geographical points, the type of rider (casual or members), among other details. Second, Bicycle Station data, where you can find the station names, locations, total number of docks, and so on.
The data is stored on Amazon Web Services (AWS) cloud, specifically in the Simple Storage Service (S3). You can access it through this link. The data cover a period from 2013 to 2023 separated by semester, quarter or month. For the analysis it will use the previous 12 months of Cyclistic trip data which runs from July 2022 to June 2023. The data includes the following variables:
Cleaning Bicycle Stations data
It is verified if the elements in the ID column are unique. There are no duplicates.
The variable ID is changed to a string.
The data type is checked per column. All the data in the columns are consistent.
The Public Rack column is created to identify which stations belong to that type. Its value is boolean.
The character string 'Public Rack - ' is removed from the start_station_name column in the Divvy_Bicycle_Stations.
Cleaning Trip data
It is verified if the elements in the ride_id column are unique. There are no duplicates.
It calculates the count of null values for each column in all dataframes. Around 15% of the data doesn't have the starting and ending stations for the trips. The possible reasons for why this is so are explained. These columns are kept.
The character string 'Public Rack - ' is removed from the start_station_name and end_station_name column in the monthly files.
Issues have arisen with the start_station_id and end_station_id columns due to data inconsistency. Look up the correct station name in the Divvy_Bicycle_Stations file and make the correction.
The unidentified stations are listed.
The data type is checked per column. All the data in the columns are consistent.
It is verified that the start and end dates of the trip fall within the established month ranges.
The ride_length column is created. The value is assigned by subtracting the started_at and ended_at columns. The format is set as HH:MM:SS.
The day_of_week column is created. The day on which the ride started is assigned.
Trips less than 1 minute in duration are excluded.
You can access to the complete .ipynb file of the data cleaning in this Datacamp workspace.
You can access to the complete .ipynb file of the exploratory data analysis in this Datacamp workspace.
This the top 4 recomendations based on the analysis:
Targeting School Holiday Periods: As observed in the analysis, the proportion of casual cyclists increases during school holidays. Cyclistic can run specific marketing campaigns to attract vacationing families and tourists with special offers for annual memberships during these high-traffic seasons.
Annual Memberships as Daily Commute Solutions: Since members mainly use bicycles during the week and at peak working hours (8:00 am and 5:00 pm), Cyclistic can emphasize the cost-effectiveness and convenience of annual memberships for daily commuting. Highlight how members avoid peak-hour traffic and save money on transportation.
User Education: Create informative content and workshops about the benefits of using electric bicycles, especially for casual users who prefer them. Educating users about the advantages of electric bicycles, such as less effort and greater distance coverage, can attract new annual members.
Marketing Campaigns During High Casual Cyclist Traffic Times: Use location-based marketing to target occasional cyclists in areas they frequently visit, such as Streeter Dr & Grand Ave, DuSable Lake Shore Dr & Monroe St, Millennium Park, etc., especially on weekends when they ride more frequently. Offer promotions and incentives to convert them into annual members.
Coursera. (2023). Case Study: How Does a Bike-Share Navigate Speedy Success?. Consultado el 4 de noviembre del 2023 en el sitio web: https://www.coursera.org/professional-certificates/google-data-analytics#outcomes
Amazon Web Services. (2022). Divvy-tripdata. Consultado el 4 de noviembre del 2023 en el sitio web: https://divvy-tripdata.s3.amazonaws.com/index.html
DataCamp. (2023). What is DataCamp Workspace?. Consultado el 4 de noviembre del 2023 en el sitio web: https://workspace-docs.datacamp.com/
Overleaf. (2023). What is Overleaf?. Consultado el 4 de noviembre del 2023 en el sitio web: https://www.overleaf.com/
In this section, I provide the links to the online resources created:
Data: https://drive.google.com/drive/folders/1EPmAbQq9qzmEpwu1oAEP8CRvVbpSoKyc?usp=drive_link
Cleaned data: https://drive.google.com/drive/folders/1cDT12bYdvmk23jUFiESkZEmaVz0kncoh?usp=drive_link
Data cleaning in Datacamp Workspace (Download File): https://app.datacamp.com/workspace/w/71936b08-7ef1-4430-b2f9-ae57c08b764c/edit/data_Cleaning.ipynb
Exploratory Data Analisys in Datacamp Workspace (Download File): https://app.datacamp.com/workspace/w/71936b08-7ef1-4430-b2f9-ae57c08b764c/edit/exploratory_Data_Analysis.ipynb
Technical report in Overleaf (Download File): https://www.overleaf.com/read/mqjmfsqbvvph#c985e7