Deliverable for the Prepare phase

A description of all data sources used


Where is your data located?

The data is located and available in the following link.

Index of bucket "divvy-tripdata"

The data is downloaded and the required data is cleaned and uploaded to R Studio Cloud where R language can be used to conduct analysis.


2)How is data organized?

Data is segregated into quarters and months. To conduct an analysis for the recent one year starting from April 2021 to March 2022, the data sets of these years .csv files. are used.


3) Are there any issues with bias or credibility in this data? Does your data ROCCC?

As the data is directly available from the Cyclistic Bike share company there is no issue of the credibility of the data. It is completely adhering to ROCCC


R->Reliability

O->Original

C->Comprehensive

C->Current

C->Cited


4) How are you accessing licensing, privacy, security, and accessibility?


The license for the data is available in the following link

https://www.divvybikes.com/data-license-agreement As per the check it is not violating any data privacy.


5) How did you verify the data’s integrity?


As per the check made the accuracy, completeness, consistency and trustworthiness are fulfilled. These are the qualities of data integrity. The data is complete as it contains all the required components to measure the entity. The data is consistent across the years with every year having its CSV file which is organized in an equal number of columns and the same data types. As the credibility was proven before, it is also trustworthy.



6) How does it help to answer your question?


From the existing components like started_at and ended_at and rideable_type which are date-timestamp variables, a relationship between annual members and casual riders can be created. The relationship can be analyzed and useful answers can be interpreted for the question, of how to convert casual riders to annual members.


7) Are there any problems with the data?

There are duplicates which need to be removed. Also, there are a few rows with 'N/A' values which needs to be removed.