This site was created to pass the last part of my Google Data Analytics Certification.
To get through the final part of the course, I've been tasked to tackle a case study and perform data analysis to help answer business questions.
I used a combination of R scripting along with Power BI to arrive at results.
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago.
The director of marketing believes the company’s future success depends on maximizing the number of annual memberships.
Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members.
But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations
I will address above problem statements using the Google thought framework:
There were three main questions my analysis is supposed to help answer:
How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?
Following the process presented during the course, I had to establish and confirm the following points:
Where is the data located? Data is located on the Amazon Web Server (link)
How is data organized? Month-specific CSV files are used. All files contain information on the number of rides, type of user (member/casual), type of bikes rented, where and when users started their ride and finished it. Upon closer inspection only 2021 has all monthly rides data, hence this is the data and files I used for further analysis.
Does the data ROCCC compliant? ROCCC stands for Reliable, Original, Comprehensive, Current, and Cited. I will look at each of those aspects separately to determine if the data is compliant with it and provide a score (Low/Medium/High):
Reliable: Some of the station names and IDs are missing from the data, however, due to having geographical coordinates we can map those if required.
     Other than that all other attributes are populated across all entries, hence data can be assessed as reliable. (Medium)
Original: Data Owner and the provider is the Cyclistic company (High)
Comprehensive: I'm just considering data from 2021, however, the records date back to 2013. In 2021 alone 5,5 million records are making it reliable sample size. (High)
Current: At the point of analysis it is still 2022. I have decided to go with a whole year of data to have a chance to look at any seasonality.
I consider 2021 as still current and relevant data. Once the entirety of 2022 will be available, it may be used to perform a year-to-year comparison. (High)
Cited: Data is owned by the Cyclistic company, hence it's trustworthy (High)
Conclusion: Based on the above analysis, dataset credibility, integrity and quality is sufficient to provide insights that can be used by the business for decision making.
How are licensing, privacy, security, and accessibility addressed? The link was provided via Google and data is publicly available to anyone.
All of the downloaded data was saved locally and only the aggregated outcomes will be shared with the public.
For processing the data I used R scripting.
The step-by-step process with explanations and code were published on a dedicated GitHub Project site - Cursera_Capstone_Project
For the analysis and sharing part of the framework, I have decided to go with Power BI.
Note: below dashboard is interactive
Open Dashboard in a new window for a better experience and visibility
Observations
Members stand for 57% of all bike rents in 2021, yet there is plenty of potential for improving revenue by converting casual users who stand for 43% of rides
Weekends are the busiest for both members and casuals, with Saturday being a peak day.
We can observe that Members usage of bikes is steady across the whole week.
The average ride duration is 20.5 minutes. Users take the longest rides on Sundays (both members and casuals)
The average ride distance is 2.3 kilometers without any meaningful spikes across the week.
The most preferred type of bike appears to be the classic bike and this is for both types of users.
Docked bikes are only used by casual members.
Looking across the whole year, the biggest spikes in renting fall to Summertime (June, July, and August)
Analyzing hourly trends:
Members: renting spikes is in the morning (6 am - 8 am) and again around the afternoon (3 pm - 6 pm) which may lead to the conclusion that they are using bikes as their primary means of commuting to and from work
Causals: there is a building usage starting from 6 am up until 6 pm.
There is almost no usage of bikes between midnight and 6 am.
The most popular Start and End Station is Streeter Dr & Grand Ave, however looking into the choices from Members and Casual users:
Members tend to start and end at Clark St & Elm St station where Casuals are mostly using Streeter Dr & Grand Ave
Findings:
From all of the trends, it appears that Casual riders use the service mostly for recreation whereas members are using bikes as their main means of commute.
Furthermore, Casual riders are most active during weekends and summertime.
Their average length of the ride for Casuals is also longer than those of the Members
Recommendations:
To influence casual riders into buying the annual memberships, the company should consider the following strategies:
Manipulating the prices by increasing the hourly / range price for non-members during weekends and summertime.
This later may be used to show the Casual users that longer term, becoming a member is more cost-efficient
Expand this analysis further by including financial data and calculating time and money saved on commute via bike vs car or public transport.
Focus advertisement efforts around the most attended stations
The seasonality during the week, year, and hours should be an important queue for the marketing team as to when scheduling marketing actions