Deep Learning for visual, audio, and sensor data analysis in Smart City environments

Scope and Topics of Interest

Recent advances in Deep Learning and high-performance computing led to remarkable solutions for visual, audio, sensor, and multi-modal data analysis problems. Deep Learning-empowered systems can nowadays achieve performance levels in various data analysis tasks which are comparable to, or even exceeding, those of humans. Even though these advancements have the potential to open new high-impact applications in Smart City environments, this promise has yet to be met. This is due to challenges in Smart City environments which go beyond the unrestricted analysis of visual, audio, and sensor data provided by Deep Learning models when run on high-end Graphics Processing Units (GPUs). The large number of sensors (like cameras, microphones, thermometers, motion sensors, etc.) available in such environments leads to the enormous size of collected data needed for effective data analytics. Rapid response and privacy requirements prohibit transfer and processing on powerful serves and require processing on the edge. However, with processing infrastructure setting restrictions in terms of processing power, battery/electric power consumption, and autonomy (embedded GPUs or low-end processors used in edge and fog computing), efficient high-performing Deep Learning models, as well as effective data fusion schemes, are required. This goes beyond the current capabilities of the state-of-the-art. Thus, novel efficient solutions are needed to successfully employ high-performing Deep Learning models in such processing platforms.

For fully exploiting Deep Learning solutions in Smart City environments setting restrictions in processing power, memory consumption, hard real-timeness, handling uncertainties in the processing outcomes, and requiring a level of interpretability, a number of challenges need to be addressed through theoretical and methodological contributions, including but not limited to:

  • Lightweight Deep Learning models for visual, audio, sensor data analysis

  • Deep Learning models for efficient multimodal data analysis and fusion

  • Sensor time-series analysis based on Deep Learning

  • Efficient Deep Learning methodologies for Internet of Things

  • Deep Learning methodologies for smart cities, including Federated Learning, Transfer Learning, Domain Adaptation, Split Computing

  • Deep Learning for applications in smart city environments, including smart homes, smart lighting, traffic prediction, data anonymization for visual analysis, intelligent transportation systems, vehicular networks

The Special Session will be a forum to exchange ideas and to discuss new developments in Deep Learning for visual, audio, and sensor data analysis in Smart City environments. Please consider to submit your latest research in the topic.

Important Dates

Title and Abstract submission: January 31, 2022

Complete paper submission: February 14, 2022

Notification of acceptance: April 26, 2022

Final paper submission: May 23, 2022


Registration through the conference main website by following the instructions provided in this link

The Special Session is supported by the EU H2020 project Multimodal Extreme Scale Data Analytics for Smart Cities Environments (MARVEL) under GA No 957337.

For inquiries concerning this Special Session please feel free to contact us at ai [at]