Existing literature tend to focus heavily on either the detection aspect or the classifier aspect of computer vision, as they are totally diversified and vast topics themselves. However, for driving assistance systems and/or ITS( Intelligent Transportation Systems ), a pipeline is established to complete these tasks in real time. Thus, to utilize my learnt skill, to experience the complexity of such a pipeline and associated errors, bugs, I decided to combine Yolov3, a CNN(Convolutional Neural Network) based very fast object detector that suits the need of such application. Using GTSDB(German Traffic Sign Database) to train the Yolov3 model on traffic sign detection, I further incorporated additional CNN (Faster R-CNN) to recognize and classify the traffic signs in the images.
For the legendary titanic machine learning competition by Kaggle, which is a supervised learning classification problem, I wanted to figure out how different algorithms perform for this problem and get firsthand experience in data cleaning and exploration.
For an online Coursera course, a publicly accessible dataset of the physio-chemical properties of a selection of Portuguese Vinho Verde wines was provided for me to train three models (Neural Network - Decision Tree - K-Nearest-Neighbors) to predict different boundaries of the properties of a high-quality white wine. It had the minor but important dataset issue(null, string manipulation) that was important to fix. Then the data visualization was a challenge as there were so many options to do it but finding the best one to suit this dataset. And modeling was a challenge in the sense of trying out different parameters for the best fitting model for this dataset. It was an opportunity for me to improve by learning more about how the parameters works and how more optimization possible to get better accuracy in this case.
The dataset provided cycling data from cycling.data.tfl.gov.uk by Government of the United Kingdom and other weather related data. Exploratory data analysis was conducted, variations were analyzed, correlations and outliers were checked and dealt with, and finally, various machine learning models were trained using the dataset and r^2 values were used as the metric of comparison. Random Forest Regression was found to be performing the best, and various recommendations for future works were made.