Related Work

From a broader perspective, our problem comes under one of the fundamental topics in computer vision, which is object detection in images. This task involves identi- fying and classifying objects in an image, to provide a semantic understanding to them. So far, a great deal of research has been done in this area in various applica- tions such as face detection and autonomous driving.

One emerging application is in the field of healthcare. In recent years, especially with the advent of deep learn- ing, we have seen various studies employing neural nets for tasks like screening, detecting conditions, disease di- agnosis,etc.TheworkofGhorbanietal. [3]hasshown the usage of CNNs to identify local cardiac structures and predict cardiovascular risk in patients. Similar re- search such as Wang et al. [4], Cires ̧an et al. [5] and Kashif et al. [6] have employed deep learning models to detect cancer cells in medical images like breast cancer pathology images and histology images.

Regarding the task of PPE detection as in our paper, various initiatives have been frequently taken especially during the COVID-19 pandemic. Researchers have done a lot of work in one subtask, specifically automatic detec- tionoffacialmasksusingdeeplearning.Loeyetal. [7] have implemented a hybrid approach, where a ResNet- 50 [8] model is used for feature extraction while classi- cal ML algorithms such as decision trees and SVM are used for classification.

Another common fundamental issue related to object detection is a class imbalance in datasets. Many researchers have approached this issue by either using data re-sampling techniques like under-sampling major- ity classes such as He and Garcia [9] and Chawla et al. [10] or oversampling minority classes. Other ap- proaches rely on cost-sensitive learning as in Krawchyk et al. [11] or Drummond and Holte [12] where loss is a weighted version of common loss functions that takes the class frequency into account. Trong et al. [13] have done an extensive study on the different types of cross- entropy losses and have found better class wise perfor- mances. Wanli et al. [14] has a unique hierarchical ap- proach where object classes are grouped into hierarchical clusters. With each level, each cluster is separately stud- ied and subsequently sub-clustered based on their deep representations. Few-shot learning is also a major area of interest, working to enable pretrained models to adapt to downstream tasks after fine-tuning with a very small amount of data. Wang et al. [15] provide an overview of major directions in few-shot learning. Bignyi et al. [16] uses knowledge from abundant base classes to find gen- eralizable features to detect minority classes. It employs a few-shot detection model combining the feature learner and a feature re-weighing model for the prediction.