SUBMISSION DEADLINE FOR ERTI CHALLENGE (ABSTRACT AND DOCKER IMAGE): JULY 31 (11:59PM Pacific Time)
In addition to the regular paper submissions track we also organize an ERTI (Embedded Real-Time Inference) challenge. In this challenge competitors need to develop a pedestrian detection framework which runs on an NVIDIA Jetson TX2 with a minimum processing speed of 8 FPS.
Each contestant needs to submit their Jetson TX2 docker image and an abstract of their approach using the paper template of the main ICCV2021 conference:
This abstract is limited to 4 pages, excluding references. Note that all challenge submissions are NOT blind reviewed, and as such do not need to be anonymized. Note the these abstract submissions will not appear in the conference proceedings. Abstract submissions are handled through the CMT submission website:
https://cmt3.research.microsoft.com/UAVisionERTI2021/
Submission of the Jetson TX2 docker image is done through the following link:
https://www.dropbox.com/request/tAVyQ09h21TeS8tuI7qG
Please note that abstract submissions without docker image or vice-versa are automatically rejected. Please mention the filename of your docker image in the comment section of the CMT submission system during submission of the abstract.
NOTE /images inside the container should point to the dataset for validation, /results should point to a folder to store the detection results (see below for more information).
NOTE Make sure to create an executable run.sh script in the /home/nvidia folder of the container that runs your code. Inference speed will be measured by measuring the time the script needs to execute. The runtime speed (FPS) will be calculated as follows: <number of test images> / <script execution time (Seconds)>
Submit the docker using the following procedure:
Exit the prepared container and create a new image:
First note the CONTAINER ID from the prepared container with
docker ps -a
Then commit your container (locally):
docker commit <container id> <my_submission>
Where "my_submission" is a logical name which is linked to your abstract submission (i.e. the name of your algorithm, the name of the first author, the CMT submission id of your abstract...) with underscores in stead of spaces. During submission of your abstract in the CMT system, you will be asked to give the name of you container.
Create a tar from your container (make sure you have enough disk space left):
docker save <my_submission> | gzip -c > <my_submission>.tar.gz
NOTE This command might take a while. Due to the limited space on the TX2, it is best to export to an external USB drive.
Now upload .tar.gz to the provided submission dropbox link above.
Challenge
The ERTI challenge is defined as follows:
Develop a pedestrian detection scheme/framework for UAV videos, optimized for high accuracy, which runs on an NVIDIA Jetson TX2 platform.
The frame rate needs to be at least 8 FPS. This includes the entire processing pipeline from reading images, pre-processing, inference and post-processing. Important: note that the input to your algorithm will be a video which is downsampled to 8 FPS! This way we ensure real-time behaviour.
You are free to use any training data available (e.g. UAV123, or VisDrone challenge, ...)
We do NOT distribute training data for the challenge. We record and annotate a private dataset on which we perform speed and accuracy measures ourselves (more information below).
All contestants should use Docker contrainer setup with Jetpack 4.5, which can be downloaded here. Please use tage r32.5.0 (JP4.5)
All contestants need to write a short abstract, and submit their final Jetson TX2 docker image
Our private evaluation dataset consists of multiple videos recorded at 1920x1080, using a DJI Mavic drone flying at altitudes around 15 metres. For qualitative evaluation we distribute a single video of around 2 minutes, of which a snapshot is shown here:
The full video for qualitative evaluation can be downloaded here (MOV format, 1920 x 1080, 712.5 MB). All private evaluation scenarios consist of similar viewpoint and flying altitudes.
Docker image information
As a starting point, each contestant should download a starting Docker container from:
https://ngc.nvidia.com/catalog/containers/nvidia:l4t-base/tags
Please use Jetpack 4.5, which is tag r32.5.0.
This docker image consist of the following libraries:
CUDA 10.2
cudnn10.2
TensorRT 7.1.3
Details are found here. Contestants submit their docker image using the link above. We will evaluate and time each docker image with the docker-run command.
Images need to be read from a mounted folder /images, containing subfolders for each video. The naming convention for the video folders are as follows: video001, video002, .... Each video folder contains images given as *.jpg, filenames with 8 digits, starting from 1 with leading zeros (e.g. 00000001.jpg, 00000002.jpg,...).
Resulting detections should be stored as follows. For each video, a folder should be created with the same foldername in a mounted folder /results. For each image in that video folder, a text file needs to be written of which the filename should be the image name with extension *.txt (e.g. 00000001.txt, 00000002.txt, ...). Each line in the file represents one detection as comma-separated values as follows:
x,y,width,height,score
All values are in non-normalised pixel format based on the original input resolution, where the top left corner of the image represents coordinate (0,0). The X and Y coordinate refer to the top left coordinate of the bounding box. If no detections are found an empty text file needs to be written.
Winner
The accuracy and framerate will be measured for each contestant. The framerate should be at least 8FPS. The framework with the highest accuracy wins the competition. The top three contestants will be selected for an oral presentation at our workshop.
For questions/remarks regarding the challenge e-mail: uavisionworkshop AT gmail.com.
FAQ
1. We do not have a Jetson TX2 module ourself, do you provide a testserver?
No, we assume that all contestans have their own Jetson TX2 module.
2. How many videos/pictures are there? Do you plan to open some portion of the dataset as validation set?
Currently we do not plan to provide any additional video material, apart from the demo video available on our website. You can use this video for initial testing and a (small) visual validation. The number of videos / pictures it not yet available, as we need to record and annotate additional datasets. We hope to record around 10 videos each consisting of about 2000 frames.
3. If there is no plan to release additional dataset for evaluation, can we assume that the sceneries and situations are not that different from UAV123 and VisDrone challenge?
The footage used for evaluation will be almost identical to the video available on our website. It is captured from a DJI Mavic Pro, at an altitude of about 15 metres, with a resolution of 1920 x 1080 with similar viewpoints.
4. For evaluation, are the input frames of each video in sequential order of each video or in random order?
Yes, for each video the frames are in order. Visual tracking techniques can be used. Note that we deliver videos at a speed of 8 FPS.
5. The 8 FPS constraint means 8 FPS on average over the whole evaluation dataset OR 8FPS for each evaluation data?
The total runtime of the docker container will be measured, and divided by the number of processed images. Hence, 8 FPS is the average over the entire dataset. We assume that the size of our evaluation dataset is significant enough such that docker/model loading times will be negligible.
6. The framework that has the "highest accuracy" wins the challenge. What is the exact metric that you use to measure "accuracy"?
We use the average precision (AP) with an IOU of 0.5 as main metric.
7. Can we use any power mode on Jetson? Or we should stick the the default? Are we allowed to freely control the clock frequency?
We will use the same power mode for all evaluations. Before the evaluation is executed we run the following commands:
* sudo nvpmodel -m 0 # set max performance mode (Max-N)
* sudo ./jetson_clocks.sh # set max static frequency to CPU, GPU and EMC clocks
8. Are multiple submissions allowed?
We allow for a maximum of three submissions for each abstract. From these three submissions, we will only report and use the one with the best AP.