PREPARE
In this second phase cvs spreadsheets containing monthly ridership data as well as quarterly ridership data was provided by the Chicago Transport Authority under license for data analysis.
1) Checked to make sure data had no bias and was credible. It was from the original source that collected it.
2) Checked to make sure data did not have any identifiable information for the riders such as name, address or credit card information.
PROCESS
I started processing the monthly cvs sheets in google sheets by trying to check and remove duplicate entries and find incomplete entries.
Realized that the number of entries numbering in the 100000's were too big for google sheets to handle. I switched to working in Rstudio Desktop.
Uploaded the quarterly data cvs files to Rstudio
Removed duplicate entries
wrote code to make column headings the same for all the files.
created new columns to separate start date and start time into separate columns
created new columns to separate end date and end time into separate columns
created new column for car ride length(end time-start time)
inserted new column to calculate weekday on which each ride was taken.
created 2 filter views to group rides by members and rides by casual riders.
removed entries for rides that were taken out for maintenance.
Finally I joined all the 4 quarterly data cleaned csv files into one database.
ANALYSIS
Calculated total number of rides by members
Calculated total number of rides by casual riders
Calculated Ride Length for members.
Calculated Ride Length for casual riders.
calculated mean , median , max and min for each group.
SHARE
Visualized bar graphs
1 Ride length by weekday for member/casual riders.
2 Number of Rides by weekday for member/casual riders.
Graphs can be seen on the Tableau Public Dashboard . Link for which is provided on Page two of this portfolio.
ACT
The number of rides taken by members is more but the ride length of casual members is more.
Need to do further analysis to see what the cost per ride for similar or identical rides(based on duration and distance ) is for member/casual rider. If based on this further analysis we find that a similar ride is more expensive for a casual rider we may be able to show them the analysis via marketing to encourage them to buy memberships, especially if a casual rider is buying a number of single ride passes for riding the same route on a frequent basis.