International Conference on Image Processing
8-11 October 2023, Kuala Lumpur, Malaysia
IMPORTANT: News about the challenge session paper submission
The submission deadline for challenge paper submission is extended by 14 days, i.e., from 26 April 2023 to 10 May 2023.
The challenge session paper submission links are now available: Link
--------------------------------------------------------------------------------------------------------------------------------------------------------
IMPORTANT: Please read the following carefully before deciding to participate in the challenge.
How to participate:
Read the whole description of this challenge given below
Access to the database will be authorised after having properly filled in the registration form through the link on the top right
There is a possibility to submit the paper describing the proposed solution via the paper submission link
The results of the solutions proposed by each team will be visible on the codalab website (the link will be provided later)
The challenge session is organized into 3 phases :
Phase 1 : Participants train their models using the provided training set and the associated annotation file. They can evaluate their model with the validation set images and upload their results on codalab, where the validation set annotation file is located (Deadline: April 7). The "JSON" result file of the tests performed on the validation set images must be uploaded to Codalab and named "predict.json" to get its score.
Phase 2 : The validation annotation file will be provided to the participants (from April 7) for tuning the model parameters. Likewise, test dataset images will be provided to participants to evaluate their models. The "JSON" result file of the tests performed on the test set images must be uploaded to Codalab and named "predict.json" to get its score. Tests performed on the validation dataset will no longer be accepted, and an error message will be returned to you. Before the deadline of this phase (Deadline: May 5), the participants must also submit their solutions and models by email (more information soon). Any updates after this deadline will not be considered.
Phase 3 : the proposed models will be evalauted by the organizing committee on the test set (from May 5). These tests will be conducted on the same machine to ensure a fair performance evaluation. Participants are strongly encouraged to propose solutions deployable in runtime environments (anaconda, docker, etc.).
The ranking of the participants of the challenge session is updated automatically.
Participants have also the opportunity to submit a challenge paper until April 26 (see ICIP webpage).
Participants who will be ranked in the top 5 of the competition will also have the opportunity to submit an extended version of their paper to a Special Issue that we will organize in Elsevier Signal Processing: Image Communication (details will be provided soon).
Brief Introduction
Image acquisition conditions can significantly affect high-level tasks in computer vision, such as object detection, object recognition, object segmentation, depth estimation, scene understanding, or object tracking just to name a few. The improvement of the sensors’ quality and deep learning methods provided an increase in robustness against distortions to reach suitable performance in various computer vision algorithms. However, even taking advantage of new sensor technologies and deep learning approaches, the performance is quite limited in real applications where the visual scene contains both local and global distortions. This is the case in autonomous vehicles, video surveillance, or medical robotics for example. Several object detection benchmarks dataset have been proposed [1-3]. The most popular benchmark is the MSCOCO dataset. The performance of the object detection models are generally evaluated using the Mean Average Precision metric. However only global image distortions are considered in the experiment. For a better assessment of the robustness of object detection models, it is important to also consider the presence of local distortions and the complexity of the observed scenes in real environments. This will give more realism and reliability to databases including such scenarios. To this end we built a database containing several images with various global and local distortions by taking into account some relevant features related to the contexts to give more realism to the images. Our dedicated dataset comprises original and distorted images from the well-known MS-COCO dataset. The synthetic distortions are generated according to several types and severity levels with respect to the scene context.
Important: The selected teams will be invited to be part of a joint paper, summarizing the top proposed solutions, to be submitted for publication in a journal.
Challenge Significance
It is important to note that the performance of most deep learning-based computer vision algorithms is limited when trained on image databases that do not contain distortions [4]. Indeed, image databases dedicated to benchmarking some computer vision algorithms do not generally include real scenarios where we have to deal with distortions due to the acquisition conditions. The robustness of learning-based computer vision algorithms is therefore dependent on the representativeness and richness of the databases in terms of distortions [5]. This observation is even more visible in real applications where distortions are more complex and heterogeneous than synthetically generated distortions. Usually, deep learning methods improve their robustness through data augmentation or dedicated architecture design. The first solution, retained in our study, is based on adding photo-realistic synthetic distortions in the training set to improve the network robustness against perturbations. This challenge will perform the first comprehensive benchmark of the impact of a wide range of distortions on the performance of current object detection methods. The proposed database contains, in addition to the conventional real distortions, other synthesized distortions corresponding to real and very frequent scenarios often neglected in other databases despite their importance. This is probably due to the difficulty of generating them by computer. In the proposed database we generate images affected by various photo-realistic distortions.This study will provide a reliable prediction of the performance of these methods in real applications thanks to the realism and coherence of our Complex Distorted COCO dataset (CD-COCO). Using the MS-COCO database as the source of original images enabled us to use ground truth information to design local distortions and use a well-known database for object detection methods. In addition, we generated complex and photo-realistic distortions by wisely choosing the distortion type and efficiently tuning the distortion parameter with respect to the scene context and the distortion type. This challenge will advance the current approach with the potential possibility of making this comprehensive benchmark an important contribution in the design of robust object detection architectures. In addition, the development of more effective and efficient computer vision algorithms with such benchmark will significantly contribute to the challenges of real-world industrial applications, such as robotics and autonomous navigation.
Rules of Participation
We will make available our CD-COCO dataset to only registered challengers to test their methods against local and global distortions at various severity levels. The proposed methods must be able to localize the objects as precisely as possible and determine their classes. The duration of the detection process will be a parameter to be taken into account in the performance evaluation. The participants would be required to submit an easy to read code of their algorithm (preferably in Matlab or Python) with comments along with a document with a summary and steps of their method. This code should contain executable script with its corresponding readme file allowing us to test their solution on our CD-COCO test set. Some illustrative results could be submitted to display the efficiency of their solution. The challengers must also provide the execution time of their solution and their system configuration to normalize the execution time between competitors. Thus, the submitted methods must try to reach the following goals:
Detect the presence of objects
Determine which class they belong to
Determine their location as precisely as possible in bounding boxes
Evaluation Criteria
We will assess the submitted methods according to the official COCO mAP metric, which characterizes the methods’ precision by their ability to detect objects and locate them accurately. The criteria of accuracy and Inference Time (fps) will be summarized in a ratio that will describe the efficiency of the solutions. Furthermore, all methods would be tested on our Lab computer with the same GPU to normalize the execution time. We will produce distorted test sets with images containing distortions at random severity levels and with severity levels increasing progressively from set to set. The evaluation will thus have 2 parts:
A general test set with all distortion types at random severity levels.
Test sets for each distortions type with a specific severity level increasing progressively.
Thereby, the committee would be able to evaluate the efficiency of the solutions in general and compare the impact of the severity levels on performance through the different test sets containing distortions with progressively increasing levels of severity. The best solutions would be selected according to their general robustness and execution time, and their robustness against distortion type and severity level.
Dataset Details
The CD-COCO dataset that will be used in this challenge session comes from the famous MS-COCO dataset that contains 123K images split into three sets, respectively the training set with 95K images, the validation set with 5K images, and the test set with 23K images. We applied dedicated distortions type at specific severity levels to the training set according to the scene context of each of its images. The choice of the distortion type would be correlated to the scene type (indoor/outdoor) and the scene context (the objects present and the scene depth). Likewise, the distortion severity level would be assigned according to the object type and position (pixel and depth) for local distortions or atmospheric distortions (rain and haze). For example, haze and rain cannot be present in indoor scenes, and the object motion blur should be correlated to the object's velocity, which depends on the object type and its position in the scene. Thus, the distortion severity level based on the object type should consider the object's sensitivity for a given distortion. Conversely, the object position and scene depth will allow to deal with the scene specificity to make the distortions more coherent according to the scene context and type.
Important: The link to access the dataset will only be provided to the registered participants.
Our CD-COCO dataset comprises local distortions such as blur motion, defocus blur, and backlight illumination applied to objects or specific areas. It is worth noticing that the weighting and magnitude of each distortion is adjusted according to the position of the object in the observed scene. This implies both 2D spatial position and depth are taken into account in the application of the synthetic distortions. This database also contains the case of global distortions related to camera parameters and characteristics, such as noise sensitivity, defocus or instabilities, and those related to acquisition conditions such as atmospheric turbulence, image artifacts (lossy compression artefacts), motion blur or uncontrolled lighting. Among the atmospheric and weather factors affecting the image acquisition quality, we consider rain and haze phenomena. The other factors related to camera sensors’ limitations are mainly noise sensitivity, contrast sensitivity and spatial resolution. The global blur may result from camera motion and/or optical defocus. Whereas, local motion blur results from moving objects. Our dataset is detailed in the following tables (see tables 1 and 2).
Challenger Organizers
Following team will run the challenge session:
Important Dates
Sponsorship and Awards
The coordinators will contact potential sponsors for supporting 1-3 awards for the competition winners.
References
Beghdadi, Ayman, Malik Mallem, and Lotfi Beji. ”Benchmarking performance of object detection under image distortions in an uncontrolled environment.” 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022.
Beghdadi, A., Qureshi, M.A., Dakkar, B.E., Gillani, H.H., Khan, Z.A., Kaaniche, M., Ullah, M. and Cheikh, F.A., 2022, October. A New Video Quality Assessment Dataset for Video Surveillance Applications. In 2022 IEEE International Conference on Image Processing (ICIP) (pp. 1521-1525). IEEE.
I. Bezzine, Z. A. Khan, A. Beghdadi, N. Almaadeed, M. Kaaniche, S. Almaadeed, A. Bouridane, F. Alaya Cheikh, ” Video quality assessment dataset for smart public security systems”, in the Proceedings of the 23rd IEEE-INMIC, Bahawalpur, Pakistan, 5-7 November 2020.
Michaelis, C., Mitzkus, B., Geirhos, R., Rusak, E., Bringmann, O., Ecker, A. S., ... & Brendel, W. (2019). Benchmarking robustness in object detection: Autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484.
Hendrycks, D., & Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014, September). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.