In our project, we only focus on the data in San Diego city from Jan 2017 to Dec 2020, consisting of
30528 rows and
49 columns. The data cleaning process involves transforming the data format, looking for entries that had missing values and duplicated values and dropping redundant columns. In total 8,254 rows and 16 columns were removed.