This case was brought to me as final project to complete the certificate of Google Analytics on Coursera.
It has the purpose to bring and refine all the skills and knowledge obtained throughout the courses and activities done. So, I can start to work as a data analyst.
Finance analyst team concluded that Divy members are the most profitable group in Divy. Our task is to identify; the difference between casual riders and Divy members plus what factors predispose a client to be a member instead of a casual.
With that knowledge, the marketing team can design a campaign targeting those key points to increase the number of annual memberships, which will result in an augmentation of profits for the company.
The data source used in this case is extracted from “Divy” which is a company that operates the City of Chicago’s bicycle sharing service. The data is first-party data, that means that “Divy” extracts and updates the data that we will be working on.
We will analyze the last 12 months of ride_trips data (03/2022 – 02/2023), where we can find data of duration trips, stations (locations), types of membership, type bike used, and geographical coordinates.
Finally, we can be sure to use this data, because it’s open source and shareable for non-profit purposes, as stated in their license.
After downloading the CSV files and storing them in my computer following the correct methodologies by using subfolders and correct naming conventions. After that, I continued by using power query from excel to import and load each CSV file and convert them in an excel workbook where I can work with the data in a table format.
Now, that I had my data in table format, and not separated by commas like gibberish. I proceed to start the data cleaning process.
TOOLS USED: Excel and RStudio.
Once we finished the data cleaning process. It was time to start our analysis, formulate an hypothesis and see what story the data will tell us.
But for that, first I had to choose which tool will help me in this endeavour. As I prefer to see visuals and how they explain facts, I decided that Tableau was the way to go.
So, having in mind our bussiness objectives/goals. I started to search for answers to the questions I formulated during the previous steps.
AVERAGE RIDE LENGTH or USERS OVER THE LAST 12 MONTHS
Having made those two visualizations, I could see some things.
Casual riders had larger duration trips on average than membership riders.
In a total sense of clients, there were a much larger proportion of clients being members than casual riders.
Then, If we think about the goal of our clients in using bikes. We can connect some dots and try to make an hypothesis.
Membership members pay an annual fee, so why would they pay annually for a bike service?
Need to use and move in the city constantly and frequently on a daily basis.
Casual members may not have that necessity, so that's why maybe they use bikes to go to other areas, enjoy the day. And, likely will have other methods of transport.
Finally, months where riders are in greater quantity, can indicate which seasons people use bikes more often. I don't think it indicates a clear difference between both types of riders as they show a similar pattern with the only difference is the varying number in each population. Weather and temperature can affect all bike users the same way, as meteorological conditions can affect gravely their method of transport and comfort in using them. Chigaco being a big city with multiple options for transport, makes riders have other options in case of bad weather for transport. So, it can be deduced why numbers drops in those months in comparison with spring and summer. Finally, those drops can be explained also by having holidays by different motives.
THESIS
Majority of membership riders are using bikes to move to work and home.
The majority of casual riders are using bikes for enjoyment, social/family activities and the like.
By following that track, I wanted to know a few things:
Which hours riders use bike services the most?
Which days of the week riders use the most?
As we can see in the visualization:
Membership riders peak from Monday to Thursday, going low Friday to Sunday.
Also, their peak hours are 8 am (for the morning) and 17h (5 pm) in the afternoon.
Casual riders peak at the weekends, instead of the weekdays like in member's case.
We can see that they don't have a morning peak hour, but there is a progressive augment in riders as hours pass in the morning. And their afternoon peak hour is 17h (5 pm) as well.
Member numbers are greater than casual numbers. So we have much more membership clients than casual clients.
Seeing this viz I had more data that backed up my thesis. But I want to further improve that cushion. So, I made the following graph.
We can observe that the average ride of membership riders is constant throughout the weekdays. When such a thing happens, I intuitively think that there is a constant trajectory that the clients are taking from Monday to Friday. In other words, going from home to work, and from work to home.
In casual case, the average ride time is more different throughout the weekdays, we cannot discard that may be casual riders using services to go to work as well. But, they are not that apparent in the data. And having almost double the Avg time than the members, a majority of casual riders may use bikes to participate in actions that are not that constant or linear like membership riders do.
Furthermore, weekends indicate casuals peak average ride duration. Days that are not usually for all people, workdays, which means that they have more time that they can dedicate to leisure activities, and can explain that increment in Avg ride time for both groups.
Viz and dashboard I made on Tableau --> https://public.tableau.com/views/Bike_sharing_study_case/DivyBikeSharingService?:language=es-ES&:display_count=n&:origin=viz_share_link
We can categorize riders in two major groups
They use bikes for transport in a constant way, and move throughout the city to go to a designed place. Constant CLIENTS.
They use bikes for leisure activities, non-constant transport. Spontaneous CLIENTS.
The majority of annual membership riders are constant clients, they use bike more often than casuals for determined reasons like work, transport, etc.
The majority of casual riders that buy day-pass or single rides are spontaneous clients, they use bikes for leisure activities and objectives that are harder to track. They spend more time in one ride rather than doing multiple rides.
So, I believe that to increment the number of annual membership riders, we have to increment the number of rides that those casual clients do. Not the duration time, but create interest in those clients to use bike sharing services more often.
Clients that do more than "x" bike rides within a week or month, can be rewarded with different benefits. Example: single rides that they can share with people that they want and that it can escalate to a future client. I suggest future analysis on which type of benefit interests more to the clients and also that can repercute less in the company profits.
Include some general sport activities in GROUP that combine leisure with a bit of movement. Group activities increase sense of belonging to people in certain places, and the majority of casuals use bikes for enjoyment. Divy can be part of their day by making group activities in the weekends where they use bikes and can enjoy some time in group.
Collaborate with companies in the area of service (bike sharing), make Divy bike services a company benefit/discount for their employees (10% off in annual membership, etc).