2nd Workshop @ AVSS 2022



DeepView: Global Multi-Target Visual Surveillance Based on Real-Time Large-Scale Analysis


in conjunction with IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS) 2022

Accepted papers will be published in IEEE Xplore!

Venue: Virtual

Date: November 29, 2022

Start Time: 13:30 CET (UTC+1)

* Our previous workshop homepage: [Link]

Overview

In recent years, there has been great progress in demand for visual surveillance systems and intelligent cities capable of providing accurate traffic measurements and essential information for user-friendly monitoring and real-world applications. Those surveillance systems are generally based on large-scale camera systems consisting of object detection, tracking, re-identification, and human behavior analysis. However, in many emerging applications, there are still severe challenges due to variances of the real-world scenes taken by large-scale multi-view cameras, such as illumination changes, dynamic backgrounds, poor data quality, and the lack of high-quality models. In order to tackle such challenges, many researchers and engineers strive for robust algorithms that can be applied to large-scale surveillance systems. Based on our fundamental knowledge, we want to further boost the performance of the visual surveillance system and make breakthroughs in this area through cooperation with various researchers including you. In this workshop, we seek original contributions reporting the most recent progress on different computer vision methodologies for surveillance analysis of large-scale visual content and its wide applications that will help make smart systems.


Program

5 Invited Talks + 4 Oral Presentations of Accepted Papers

Start Time: 13:30 / End Time: 17:45

All times are CET(UTC+1)

Invited Speakers


Kin-Choong Yow

Professor at University of Regina

[Topic] Visual Surveillance via Abnormal Event Detection and Localization

[Biography / Abstract]

[Biography]

Dr. Kin-Choong Yow obtained his B.Eng (Elect) with 1st Class Honours from the National University of Singapore in 1993, and his Ph.D. from Cambridge University, UK in 1998. He joined the University of Regina in September 2018, where he is presently a Professor in the Faculty of Engineering and Applied Science. Prior to joining UofR, he was an Associate Professor in the Gwangju Institute of Science and Technology (GIST), Republic of Korea, (2013-2018), Professor at the Shenzhen Institutes of Advanced Technology (SIAT), P.R. China (2012-2013), and Associate Professor at the Nanyang Technological University (NTU), Singapore (1998-2013). In 1999-2005, he served as the Sub-Dean of Computer Engineering in NTU, and in 2006-2008, he served as the Associate Dean of Admissions in NTU.

Dr. Kin-Choong Yow’s research interest is in Artificial General Intelligence and Smart Environments. Artificial General Intelligence (AGI) is a higher form of Machine Intelligence (or Artificial Intelligence) where the intelligent agent (or machine) is able to successfully perform any intellectual task that a human being can. Kin-Choong Yow has published over 100 top quality international journal and conference papers, and he has served as reviewer for a number of premier journals and conferences, including the IEEE Wireless Communications and the IEEE Transactions on Education. He has been invited to give presentations at various scientific meetings and workshops, such as ACIRS, in 2018 and 2019; ICSPIC, in 2018; and ICATME, in 2021. He is the Editor-in-Chief of the Journal of Advances in Information Technology (JAIT), a Managing Editor of the International Journal of Information Technology (IntJIT), and a Guest Editor of MDPI Applied Sciences. He is also a member of APEGS and ACM, and a senior member with the IEEE.

[Abstract]

With the many emerging challenges in public management, security, and safety, there is an increasing need for monitoring public scenes through surveillance cameras. This talk discusses the anomaly detection problem as a multiple scene formulation in a supervised learning schema. In this talk, we will explore techniques such as the joint representation learning of appearance and motion, adversarial event prediction, and the use of multiple deep learning architectures such as CNNs and RNNs to detect real world violence scenes. We focus on the UCF-Crime dataset, which includes various abnormal, illegal, and violent behaviour captured by surveillance cameras in public places, that can lead to severe problems for individuals and a society's population.


Arif Mahmood (Google Scholar)

Professor at Information Technology University

[Topic] Visual Anomaly Detection in Videos

[Biography / Abstract]

[Biography]

Dr. Arif Mahmood is currently a Professor of Computer Science, Director Computer Vision Lab and Controller of Examination at Information Technology University, Lahore, Pakistan. He has also worked as “Computer Vision Consultant” with King Abdulaziz University, Jeddah, Saudi Arabia and Huazhong University of Science and Technology, China in 2021. Before joining ITU, he served as a Research Assistant Professor at the School of Computer Science and Software Engineering, and later at the School of Mathematics and Statistics in University of the Western Australia from 2012-2015. Previously, he was an Assistant Professor at the Punjab University College of Information Technology from 2008-2012. He also worked as a Postdoctoral Researcher at the College of Engineering in Qatar University from 2015-2018. In 2015 he successfully authored Linkage Project grant from Australian Research Council (ARC): “Machine Learning for Fracture Risk Assessment from Simple Radiography” with industry partners.

He has excellent teaching and research experience at several national and international Universities. He has also conducted a three-day seminar course in 2021 on Computer Vision research in Guangdong University of Petrochemical Technology, China. He was Program Committee member in FIT 2017 and in International Workshop on “Robust Subspace Learning and Applications in Computer Vision” in conjunction with ICCV 2017, ICCV 2019, and ICCV2021. His major research interests include data clustering, classification, action and object recognition using image sets. He has also worked on computation elimination algorithms for fast template matching, video compression, object removal and image mosaicking. At the School of Mathematics and Statistics UWA, he worked on community detection in complex networks.

He has been publishing extensively in prestigious journals including IEEE TPAMI, IJCV, MEDIA, IEEE TIP, IEEE TCSVT, IEEE TCC, IEEE TSC, Information Fusion, and IEEE TKDE. He has presented his work at over 17 conferences worldwide including very prestigious conference such as CVPR, ECCV, ACCV, BMVC, and WACV.

[Abstract]

Video anomaly detection is well investigated in weakly supervised and one-class classification (OCC) settings. However, unsupervised video anomaly detection is quite sparse, likely because anomalies are less frequent in occurrence and usually not well-defined, which when coupled with the absence of ground truth supervision, could adversely affect the convergence of learning algorithms. This problem is challenging yet rewarding as it can completely eradicate the costs of obtaining laborious annotations and enable such systems to be deployed without human intervention. To this end, we propose a novel unsupervised Generative Cooperative Learning (GCL) approach for video anomaly detection that exploits the low frequency of anomalies towards building a cross-supervision between a generator and a discriminator. In essence, both networks get trained in a cooperative fashion, thereby facilitating the overall convergence. We conduct extensive experiments on two large-scale video anomaly detection datasets, UCF crime and ShanghaiTech. Consistent improvement over the existing state-of-the-art unsupervised and OCC methods corroborate the effectiveness of our approach.


Reference Papers:

[1] M Z Zaheer, Arif Mahmood, M H Khan, M Segu, F Yu, S I Lee, “Generative Cooperative Learning for Unsupervised Video Anomaly Detection'', in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022.

[2] M Z Zaheer, J H Lee, Arif Mahmood, M Astrid, S I Lee, ``Stabilizing Adversarially Learned One-Class Novelty Detection Using Pseudo Anomalies'', in IEEE Transactions on Image Processing (TIP), 2022.

[3] M Z Zaheer, Arif Mahmood, M Astrid, S I Lee ``CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection", European Conference on Computer Vision (ECCV), 2020.


Kuk-Jin Yoon

Associate Professor at Korea Advanced Institute of Science and Technology

[Topic] Event-camera-based Computer Vision

[Biography / Abstract]

[Biography]

Dr. Kuk-Jin Yoon received his B.S., M.S., and Ph.D. degrees in Electrical Engineering and Computer Science from the Korea Advanced Institute of Science and Technology (KAIST) in 1998, 2000, and 2006, respectively. He was a PostDoctoral Fellow in the PERCEPTION Team, INRIA, Grenoble, France, from 2006 to 2008, and was an Assistant/Associate Professor at the School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, South Korea, from 2008 to 2018. He is now an Associate Professor at the Department of Mechanical Engineering and the Graduate School of Artificial Intelligence, KAIST, South Korea, leading the Visual Intelligence Lab. His research interests include vision-based ADAS, stereo, 3D reconstruction, visual object tracking, SLAM, and structure-from-motion, omnidirectional and event camera based vision, etc.

[Abstract]

Event cameras are bio-inspired vision sensors that mimic the human eye in receiving the visual information. While traditional cameras transmit intensity frames at a fixed rate, event cameras transmit the changes of intensity at the time of the changes, in the form of asynchronous events that deliver space-time coordinates of the intensity changes. They have lots of advantages over traditional cameras, e.g. low latency in the order of microseconds, high temporal resolution, and high dynamic range. However, since the outputs of events cameras are the sequences of asynchronous events over time rather than actual intensity images, most existing algorithms cannot be directly applied. In my talk, I will introduce recent researches on exploiting event cameras for various computer vision task, and show how event cameras can be applied to real situations to overcome the limitations and challenges of conventional cameras.

Du Yong Kim

Assistant Professor at Royal Melbourne Institute of Technology

[Topic] Random Finite Set Filters and Visual tracking

[Biography / Abstract]

[Biography]

Dr. Du Yong Kim received the BE degree in electrical and electronics engineering from Ajou University, Korea, in 2005, and the MS and PhD degrees from the Gwangju Institute of Science and Technology, Korea, in 2006 and 2011, respectively. As a postdoctoral researcher, he worked on statistical signal processing and image processing at the Gwangju Institute of Science and Technology, Korea (2011–2012), and the University of Western Australia, Australia (2012–2014), Curtin University, Perth, Western Australia (2014–2018). He is currently working as a vice-chancellor’s research fellow and Lecturer (Assistant Professor) at the School of Engineering, RMIT University, Australia. His main research interests include Bayesian filtering theory and its applications to machine learning, computer vision, sensor networks, and automatic control.

[Abstract]

The last decade has witnessed exciting developments in multi-target state estimation with the introduction of stochastic geometry to the field. Stochastic geometry--the marriage between geometry and probability--is a mathematical discipline that deals with random spatial patterns. The history of stochastic geometry traces back to the problem of Buffon's needle and has long been used by statisticians in many diverse applications including astronomy, atomic physics, biology, sampling theory, stereology, etc. Since 2003, Mahler's seminal work on the random finite set approach to multi-object filtering, which culminated in the probability hypothesis density (PHD) filter, has continued to attract substantial interests from academia and industry alike. This talk presents an overview of the random finite set paradigm and its application to visual tracking. Visual tracking examples in this talk include visual surveillance and microscopy image analysis.


Fatma Guney

Assistant Professor at Koç University

[Topic] Stochastic Future Prediction in Real World Driving Scenarios

[Biography / Abstract]

[Biography]

Dr. Fatma Guney recieved her Ph.D. from Max Planck Institute for Intelligent Systems and later joined as postdoc at University of Oxford. Her research interests include 3D computer vision and representation learning from video sequences. She also worked on action recognition, motion estimation, depth estimation, and multi-view 3D reconstruction. She is leading Autonomous Vision Group (AVG) at KUIS AI center. AVG is working on a range of topics including but not limited to monocular depth estimation and semantic segmentation, multi-object tracking, unsupervised video object segmentation, end-to-end learning of driving, and stochastic future prediction.

[Abstract]

Dr. Fatma Guney will talk about future prediction in video sequences. They propose to address the inherent uncertainty in future predictions with stochastic models. While most of the previous methods predict the future in the pixel space, they propose to predict the future also in the motion space to separately model appearance and motion history. They then extend our solution to real-world driving scenarios where the background moves according to the ego-motion of the vehicle. They predict the changes in the static part by modeling the structure and ego-motion. Conditioned on the static prediction, they predict the remaining changes in the dynamic part which correspond to independently moving objects. Finally, they propose to combine information from multiple cameras into a Bird’s Eye View (BEV) representation and predict the future in that compact representation. They efficiently learn the temporal dynamics in the BEV representation with a state space model. Their models outperform the previous methods on standard future frame prediction datasets MNIST, KTH, and BAIR but especially in real-world driving datasets KITTI, Cityscapes, and NuScenes.

Organizers

Moongu Jeon

Gwangju Institute of Science and Technology (GIST)

Yuseok Bae

Electronics and Telecommunications Research Institute (ETRI)

Kin-Choong Yow

University of Regina

Jinyoung Moon

Electronics and Telecommunications Research Institute (ETRI)

Du Yong Kim

RMIT University