Product Analysis Using R
Cyclistic is a successful bike-share company. Cyclistic offers up to 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
In efforts to reach a broad consumer segments, they provide flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
Cyclistic’s has identified that annual members are much more profitable than casual riders. They want to maximize the number of annual members will be key to future growth by converting casual riders into members.
The stakeholders want to better understand how annual members and casual riders differ, why casual riders would buy a membership and how can Cyclistic use digital media to influence casual riders to become members.
Lily Moreno: The director of marketing. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program.
Cyclistic executive team: The detail-oriented executive team will decide whether to approve the recommended marketing program.
The data source used for this case study is Cyclistic historical trip data. The data is made available by Motivate International Inc. Thirteen datasets from August 2021 to August 2022 were selected. From the data collected over a year, I created a combined dataset containing 6,687,395 rows and 15 columns. The dataset were downloaded in .csv file format.
The datasets were published by Divvy Bikes. The license agreement grants a non-exclusive, royalty-free, limited, perpetual license to access, reproduce, analyze, copy, modify, distribute in your product or service and use the data for any lawful purpose.
According to the dataset information submitted data-privacy issues prohibit the use of rider's personally identifiable information. This means we cannot connect pass purchases to credit card numbers to determine if casual riders have purchased multiple single passes. The ride_id are unique variables and do not reflect the user, therefore it is not possible to identify the individual rider’s behavior and to clarify the user’s use of their membership.
For this case study analysis the following datasets were chosen:
For this case study, I used R to clean the data. I've uploaded my R script to RPubs here.
Consistency: First I combined all the datasets into one data frame so that all the data would be transformed at the same time. I also used the "skimr" function to see if there were empty cells or if any deviations from the expected character count or variable type.
Adapting columns: In order to do further analysis on the data frame I separated the date column into four new columns: month, day, year and day of the week. I created a new column, ride_duration, from using the "difftime" function of the ended_at and started_at columns.
Converting datatypes: For the ride_duration I converted the datatype from 'difftime num' to 'numeric' so the program could find any 0 or negative values.
Zero/Negative values: Since the "difftime" function was used on ride_duration I had to make sure that there were no zero or negative values. After I made sure all the zero/negative values were removed I created a new table with the clean table.
Outliers: After reviewing the summary of ride_duration and ride_distance I saw the maximum time spent on one bike trip was 5 weeks and the maximum distance was 741 miles. This notified me to check and eliminate any outliers. Using my knowledge of statistics and the Interquartile Method I removed the outliers.
I want to understand how casual riders are using Cyclistic in comparison to riders that are members.
Again, the R script and the visualizations can be found here.
Casual riders on average spend more time using the bikes than those who have an annual membership. On average casual riders spend close to 30 minute while members spend around 13 mins on average.
For casual rider, the average time spent on a bike ride increases on Saturday and Sunday. For members the ride duration are very similar regardless of the day
On the weekends, the amount of bike rides increases for casual riders but for members the amount decreases.
Around 3PM and 7PM there is an increase in total rides for both casual and member, mostly casual. The peak hour with the most amount rides is around 5PM.
On Tuesday, Wednesday and Thursday around 5PM is when the most amount of casual riders are using Cyclistic. The Monday through Friday graphs show two peaks, one in the morning and one at 5PM. Casual riders could possibly be using the bikes to commute.
August was the month with the most amount of rides. In the summer there is an increase in the total amount of rides and the winter there is a decrease in rides.
The classic bike is the most popular type of bike for both casual riders and members.
Members routes are more spread out and they use more routes than casual riders.
Casual riders use routes more condensed in the city.
How do annual members and casual riders differ?
The casual rider, on average, spends more time on the bike than an annual member. Casual riders spend 15 minutes per trip while annual members spend 11 minutes. For annual members the ride duration is relatively the same regardless of the day, however, for casual rider the ride duration increases during the weekend. Even though the data shows that both casual and annual riders use the bikes to commute during weekdays, annual riders use this feature at a higher frequency during the weekdays while casual riders use the bike more during the weekends for leisure.
Why would casual riders buy a membership?
Based of the maps, casual riders are using the bikes to visit small sections of the city for a longer period of time. This leads me to believe casual riders could be people that are tourist or new to the city. They are using the bike most likely sight-see or for other leisure activities. To convince them to buy a membership, appeal to their travel lifestyle. Mention the other cities and states that have Cyclistic bikes. Show that the brand in nationwide and that it would benefit the consumer in the long run to have a membership. They can use the membership while traveling and when they return home.
How can Cyclistic use digital media to influence casual riders to become members?
The analysis of the data showed that the best time to influence casual riders into buying a membership were:
a) The weekends before 3PM
b) Tuesday, Wednesday and Thursday around 5PM
c) In the summer months, specifically August.
For these time periods to increase the amount of ads running. For the weekends, an hour or thirty minutes before 3PM, I would send app/email notification providing limited offer for annual pass.
Employ the same tactic for the Tuesday, Wednesday and Thursday 5PM time slot but with some variation. Since it seems that casual riders are the bikes to commute on those days I would run ads not only around 8AM and 4PM but also before or during lunch breaks when they are most likely to be on their phones.
Run promotional deals during summer months when it warmer specifically in July since it is the month with the most amount of casual riders.
Implement a new annual membership tier at a lower price point where a total number of rides are allotted for a given time period (e.g. week, month), versus current annual membership structure (unlimited number of rides, 45-min. limit per ride), and given that on average, casual riders spend more time on their rides than current members.