Case Study 1 Case Study: How Does a Bike-Share Navigate Speedy Success?

 💻 Manchal Prashanth Reddy                                                                                                                          🔗LinkedIn

Introduction

Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path. By the end of this lesson, you will have a portfolio-ready case study. Download the packet and reference the details of this case study anytime. Then, when you begin your job hunt, your case study will be a tangible way to demonstrate your knowledge and skills to potential employers.

 Scenario :

 You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

Characters and teams :

 ● Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.

● Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.

 ● Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.

 ● Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

About the company :

 In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.


Ask Phase  :

  How do annual members and casual riders use Cyclistic bikes differently?

Business Task:  The using pattern of annual members and casual riders in order to launch marketing campaign.

Tools to be used : SQL , R , Excel

Additionally I used Power BI as well to create some meaningful visualizations.

Prepare Phase :

Dataset is available under this license  and can be found here .

I downloaded the data from the divvy trip data and stored in desktop.

Process and analyze Phase :

In SSMS:

I used a variety of tools in-order to process and standardize the data. First I imported all the datasets in Microsoft sql server management studio. I appended all the datasets with the join query after making sure all the columns were matched and the datatypes were synchronized.

Problems : initially the datatypes were little bit different at the time of importing the data as the ssms automatically adjusted the datatype.

After join query I deleted the unwanted columns, for this particular case I deleted the latitude , longitude , start station Id , end Station Id.

I imported this data to the power BI to check the quality of columns as Power BI has capacity to autodetect errors, blank spaces. I replaced null station names to NA as to avoid confusion.

IN R :

I did the rest of the things in R

Here is the process and Steps

installing and reading Packages

library(tidyverse)

── Attaching core tidyverse packages ───────────────────────────────── tidyverse 2.0.0 ──

dplyr    1.1.3     readr    2.1.4

forcats  1.0.0     stringr  1.5.0

ggplot2  3.4.3     tibble   3.2.1

lubridate 1.9.3     tidyr    1.3.0

purrr    1.0.2     

── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──

dplyr::filter() masks stats::filter()

dplyr::lag()    masks stats::lag()

library(lubridate)

##
## Attaching package: ‘lubridate’

## The following objects are masked from ‘package:base’:
##
## date, intersect, setdiff, union

library(ggplot2)
library(dplyr)
library(skimr)

Reading the cleaned data set /checking the Structure 

`biketripdata` <- read.csv(“year1.csv”)

head(`biketripdata`) 

##        ride_id rideable_type              started_at
## 1 169C7CEEFA777325   docked_bike 2021-04-17 04:19:01.0000000
## 2 169C90EB4F725D4F   docked_bike 2021-07-16 14:04:39.0000000
## 3 169C9918EF1365A2 electric_bike 2021-06-11 14:50:27.0000000
## 4 169CA05C802CECBC electric_bike 2021-07-09 16:39:20.0000000
## 5 169CA7E2D1C4DF8A   docked_bike 2020-08-01 11:38:55.0000000
## 6 169CAC157CA81FFE  classic_bike 2021-06-04 17:06:05.0000000
##                  ended_at        start_station_name
## 1 2021-04-17 04:24:02.0000000 Greenview Ave & Fullerton Ave
## 2 2021-07-16 14:59:07.0000000            Dusable Harbor
## 3 2021-06-11 15:02:48.0000000  Racine Ave & Washington Blvd
## 4 2021-07-09 17:04:37.0000000    East End Ave & 87th St
## 5 2020-08-01 11:39:14.0000000      Morgan Ave & 14th Pl
## 6 2021-06-04 17:27:25.0000000                Walsh Park
##             end_station_name member_casual
## 1 Sheffield Ave & Wrightwood Ave    casual
## 2 Lake Shore Dr & North Blvd    casual
## 3      Wood St & Chicago Ave    casual
## 4               Calumet Park    member
## 5       Morgan Ave & 14th Pl    casual
## 6   Humboldt Blvd & Armitage Ave    casual

Checking the structure :

str(biketripdata)

## ‘data.frame’: 4731081 obs. of  7 variables:
##  $ ride_id       : chr  “169C7CEEFA777325” “169C90EB4F725D4F” “169C9918EF1365A2” “169CA05C802CECBC” …
##  $ rideable_type : chr  “docked_bike” “docked_bike” “electric_bike” “electric_bike” …
##  $ started_at    : chr  “2021-04-17 04:19:01.0000000” “2021-07-16 14:04:39.0000000” “2021-06-11 14:50:27.0000000” “2021-07-09 16:39:20.0000000” …
##  $ ended_at      : chr  “2021-04-17 04:24:02.0000000” “2021-07-16 14:59:07.0000000” “2021-06-11 15:02:48.0000000” “2021-07-09 17:04:37.0000000” …
##  $ start_station_name: chr  “Greenview Ave & Fullerton Ave” “Dusable Harbor” “Racine Ave & Washington Blvd” “East End Ave & 87th St” …
##  $ end_station_name  : chr  “Sheffield Ave & Wrightwood Ave” “Lake Shore Dr & North Blvd” “Wood St & Chicago Ave” “Calumet Park” …
##  $ member_casual : chr  “casual” “casual” “casual” “member” …


Checking the data summary :

skim(biketripdata)

Data summary

Variable type: character  :


As the date in not in data format I created new column as ridedate and assigned data type in ride_date

biketripdata$rideDate<-as.Date(biketripdata$started_at)

as we need the ride duration ; start date and end data are also assigned as Datetime data type

biketripdata$started_at<-as_datetime(biketripdata$started_at)
biketripdata$ended_at<-as_datetime(biketripdata$ended_at)

Creating the month , day , year , day of week as we need them further in analysis or the ride

biketripdata$month<-format(as.Date(biketripdata$rideDate),”%B”)
biketripdata$day <-format(as.Date(biketripdata$rideDate),”%d”)
biketripdata$year<-format(as.Date(biketripdata$rideDate),”%Y”)
biketripdata$day_of_week<-format(as.Date(biketripdata$rideDate),”%A”)

##checking the added column names again

colnames(biketripdata)

##  [1] “ride_id”        “rideable_type”  “started_at”      
##  [4] “ended_at”       “start_station_name” “end_station_name”
##  [7] “member_casual”  “rideDate”       “month”           
## [10] “day”            “year”           “day_of_week”

checking the quality of all things

skim(biketripdata)

Data summary

Variable type: character 

##Creating the column – length of Ride and checking the summary

biketripdata$length_of_ride=difftime(biketripdata$ended_at,biketripdata$started_at)
summary(biketripdata$length_of_ride)

##   Length Class Mode
##  4731081 difftime  numeric

##Changing the column into numeric

biketripdata$length_of_ride=as.numeric(biketripdata$length_of_ride)

##filtering the length of ride less than 0 seconds

biketripdata_V2<-filter(biketripdata,length_of_ride>0)

##minimum Length of ride and maximum length of ride

min(biketripdata_V2$length_of_ride)

## [1] 1

max(biketripdata_V2$length_of_ride)

## [1] 3356649

##average length of ride along with min and max

biketripdata_V2%>%summarise(min_ride_length=min(length_of_ride),max_ride_length=max(length_of_ride),average_length_ride=mean(length_of_ride))

##   min_ride_length max_ride_length average_length_ride
## 1           1     3356649        1606.376

##length of Ride by member_type

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual,FUN = mean)

##   biketripdata_V2$member_casual biketripdata_V2$length_of_ride
## 1                    casual                   2267.697
## 2                    member                  1077.608

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual,FUN = median)

##   biketripdata_V2$member_casual biketripdata_V2$length_of_ride
## 1                    casual                       1079
## 2                    member                        629

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual,FUN = max)

##   biketripdata_V2$member_casual biketripdata_V2$length_of_ride
## 1                    casual                    3356649
## 2                        member                    2005282

Before proceeding with the weekday analysis , lets sort out the things by weekday

biketripdata_V2$day_of_week<-ordered(biketripdata_V2$day_of_week,levels=c(‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’))

##mean length of ride by membertype and day of week

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual+biketripdata_V2$day_of_week,FUN=mean)

## biketripdata_V2$member_casual biketripdata_V2$day_of_week
## 1                     casual                  Monday
## 2                     member                  Monday
## 3                     casual                 Tuesday
## 4                     member                 Tuesday
## 5                     casual               Wednesday
## 6                     member               Wednesday
## 7                     casual                Thursday
## 8                     member                Thursday
## 9                     casual                  Friday
## 10                    member                  Friday
## 11                    casual                Saturday
## 12                    member                Saturday
## 13                    casual                  Sunday
## 14                    member                  Sunday
## biketripdata_V2$length_of_ride
## 1                   2170.0176
## 2                    855.2013
## 3                   1958.3320
## 4                    832.1721
## 5                   2527.8431
## 6                   2112.8943
## 7                   1933.2624
## 8                    831.0379
## 9                   2088.6483
## 10                   866.7661
## 11                  2374.0114
## 12                   971.0323
## 13                  2550.7374
## 14                  1003.5724

##Maximum length of ride by member type and weekday name

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual+biketripdata_V2$day_of_week,FUN=max)

## biketripdata_V2$member_casual biketripdata_V2$day_of_week
## 1                     casual                  Monday
## 2                     member                  Monday
## 3                     casual                 Tuesday
## 4                     member                 Tuesday
## 5                     casual               Wednesday
## 6                     member               Wednesday
## 7                     casual                Thursday
## 8                     member                Thursday
## 9                     casual                  Friday
## 10                    member                  Friday
## 11                    casual                Saturday
## 12                    member                Saturday
## 13                    casual                  Sunday
## 14                    member                  Sunday
## biketripdata_V2$length_of_ride
## 1                     2033524
## 2                     2005282
## 3                     2335375
## 4                      433425
## 5                     3257001
## 6                     1742998
## 7                     2946429
## 8                    1682901
## 9                     3341501
## 10                     713853
## 11                    3356649
## 12                     990269
## 13                    3235296
## 14                    1870176

sorting the data by months

biketripdata_V2$month<-ordered(biketripdata_V2$month,levels=c(‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘November’, ‘December’))

mean ride length by months and Rider type

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual+biketripdata_V2$month,FUN=mean)

## biketripdata_V2$member_casual biketripdata_V2$month
## 1                     casual           January
## 2                     member           January
## 3                     casual          February
## 4                     member          February
## 5                     casual             March
## 6                    member             March
## 7                     casual             April
## 8                     member             April
## 9                     casual               May
## 10                    member               May
## 11                    casual              June
## 12                    member              June
## 13                    casual              July
## 14                    member              July
## 15                        casual            August
## 16                    member            August
## 17                    casual         September
## 18                    member         September
## 19                    casual           October
## 20                    member           October
## 21                    casual          November
## 22                    member          November
## 23                    casual          December
## 24                    member          December
## biketripdata_V2$length_of_ride
## 1                   1541.0754
## 2                    772.3612
## 3                   2962.6862
## 4                   1081.4072
## 5                   2289.6329
## 6                    838.2321
## 7                   2281.5631
## 8                    881.4281
## 9                   2294.1080
## 10                   878.4114
## 11                  2227.5393
## 12                   880.7155
## 13                  1967.5752
## 14                   854.4311
## 15                  2686.9206
## 16                  1004.9515
## 17                  2287.2709
## 18                   927.9440
## 19                  1809.8691
## 20                   839.4808
## 21                  3439.3526
## 22                  3795.2678
## 23                  1610.9252
## 24                   764.6206

##counting the total ride per month for the member types

biketripdata_V2%>%count(month,member_casual)

##    month member_casual  n
## 1 January    casual  18117
## 2 January    member  78713
## 3   February    casual  10130
## 4   February    member  39488
## 5  March    casual  84029
## 6  March    member 144457
## 7  April    casual 136590
## 8  April    member 200607
## 9    May    casual 256888
## 10   May    member 274693
## 11  June    casual 370639
## 12  June    member 358895
## 13  July    casual 442019
## 14  July    member 380322
## 15 August    casual 289608
## 16 August    member 332642
## 17 September    casual 230669
## 18 September    member 302230
## 19   October    casual 144994
## 20   October    member 243619
## 21  November    casual  88169
## 22  November    member 171897
## 23  December    casual  29998
## 24  December    member 101189

counting the ridable type for Riders

biketripdata_V2 %>%count(member_casual,rideable_type)

##   member_casual rideable_type   n
## 1    casual  classic_bike  695208
## 2    casual   docked_bike  760291
## 3    casual electric_bike  646351
## 4    member  classic_bike 1090190
## 5    member   docked_bike  797881
## 6    member electric_bike  740681

Visualization Phase :

Summary of the average ride length per day of week and Rider type

biketripdata_V2%>%group_by(member_casual,day_of_week)%>%summarise(average_ride_length=mean(length_of_ride))%>%
  ggplot(aes(x=member_casual,y=average_ride_length,fill=day_of_week)) +
  geom_bar(position=”Dodge”,stat = “identity”) +
  labs(title=”distribution of average ride length by week”,subtitle=”sorted by membership”)

## `summarise()` has grouped output by ‘member_casual’. You can override using the `.groups` argument.

  The below Plot over Period of One Year data :

 

 The below Plot over Period of One Month data :



Total Rides per year and membership type

biketripdata_V2%>%group_by(year,member_casual)%>%summarise(Ridenumbers=n())%>%
  ggplot(aes(x=year,y=Ridenumbers, fill=member_casual)) +
  geom_bar(position=’Dodge’,stat=’identity’) +
  labs(title=”distribution of ride numbers by year”,subtitle=”Membership comparision”)

The below Plot over Period of One Year data :


The below Plot over Period of One Month data :


Alternatively I created the visualization in Power BI as well to utilize my skills:


Key Takeaways 

• The number of trips from both user bases increase during the Warm Season in Chicago [June 3rd to September 20th]. 

• Casual Riders peak and the number of Members' trips during surpass weekends. 

• The peak number of trips from both Members and Casual Riders occurs during Chicago rush hour. 

• Casual Riders spend an average 50% more time per each trip.

ACT Phase :

Now on the basis of our above steps ask , prepare , process, analyze , share I have come into following recommendations : 

5. The most popular bikes are Docked bike which are preferred means of bike for casual riders so company can focus more on Docked bikes.