PROJECT PROPOSAL

The paper we wish to reproduce is titled “Objects as Points” written by Xingyi Zhou, Dequan Wang, Philipp Krähenbühl. This paper discusses detection of objects as axis-aligned boxes in an image(s). It talks of how they provide a simpler and more efficient alternative to object detection tasks. Two stage-detectors will recompute image features for each potential box, and then recompute / classify those features. Post-processing (specifically non-maxima suppression) removes duplicated detections for the same instance by computing bounding box IoU. This post-processing is hard to differentiate and train.

Zhou, Wang, and Krähenbühl represent objects by a single point at their bounding box center. After, other properties are regressed directly from the image features at the center location. Other properties include object size, dimension, 3D extent, orientation, and so on, and so forth. Their method performs competitively with more sophisticated multi-stage methods and it also runs in real time.

Other papers that are relevant include “Object Detection with Deep Learning: A Review” written by Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xindong Wu, “3D Bounding Box Estimation Using Deep Learning and Geometry” written by Arsalan Mousavian, Dragomir Anguelov, John Flynn, Jana Kosecka, and “Weighted Unsupervised Learning for 3D Object Detection” written by Kamran Kowsari and Manal H. Alassaf.

All three of these papers discuss different levels of object detection as well as different approaches, techniques, cameras, and images. The papers talk of K-weights, clusters, neural networks, and so on, and so forth. All of these papers collectively provide a deeper understanding to the main paper we have chosen to undertake.

Our preliminary plan for how we will reproduce it is to try different datasets to identify objects as 3D images using python on PyCharm CE or IDLE (whatever provides better interaction). We are going to try to webcam test using the algorithm the writers provide through “CenterNet” (the name of their Machine Learning algorithm) to look at images that involve simple geometry. The data we will use will be images and if we can tackle that well enough, we would like to see how well the webcam and real time images would do.

We will evaluate how we have confirmed the original findings or not depending on if we can emulate the same results using different datasets that we provide in correspondence to seeing if we can emulate the same results using the datasets that they have used in their examples.

Since our team is comprised of two members, Max will provide the objects and images that we will use to test the dataset and will comprise the dataset and Anvitha will work on the code in order to fit and test it to the new dataset and figure out the webcam configuration. The website creation will be a dual task, as will much of the work considering it is a two-person team, but things like writing and editing will be done by both members.