Video Summarization for Large-scale Analytics Workshop

·        Duration:1-day

·        Call for Submissions (Here's CMT Submission Link )

   The deluge of video and other sensor content collected from diverse sources represents an emerging wave of unstructured big data from large enterprises including social networks for which there is a growing demand for automated video analytics.  For example, mobile smart sensors, smartphones, distributed camera array systems, wearable cameras, bodycams, dashcams, and  autonomous navigation class devices are expected to grow to ten billion in number by 2025 and  will  constitute a core part of  the next generation of distributed computational infrastructure supporting critical applications in online social networks, healthcare, safety, security, financial, and automotive sectors of the economy.  This content while predominantly visual, are typically heterogeneous data streams reflecting physical reality around us from visual, infrared, multispectral, inertial and GPS sensors mounted on either mobile (UAV, UGV, Satellites) or static platforms physically distributed over geographies ranging anywhere from a small building campus to a large nation or continent. The sensor data reflects the physical phenomena associated with fixed, continuously moving, or intermittently observed coverage areas depending on whether the sensors are airborne (ie UAVs), vehicle mounted (ie UGVs, dashcams), body-mounted (cellphones, bodycams), or mounted on static structures (ie buildings),.  Typically, the data streams (data in motion) are only loosely related to each other in terms of the entities they reflect including, people, objects, events, time, space (who/what/when/where) and to the previously archived static data (data at rest). Innovative uses of these data streams has been considered for a range of applications such as gaining better situational awareness in many new domains ranging from surveillance, to ecological and environmental mapping, urban planning infrastructure management and natural disaster relief scenarios as well as biomedical healthcare applications.

   New use cases of these data streams are continuously being developed that involve more comprehensive functionalities involving planning, navigation, reaction, and interaction capabilities in a variety of situations.  More complex applications involving millions of pairwise associations between spatio-temporal entities will lead to a rich set of semantic questions and natural language query-based answering systems (such as video to text), to analyze and describe a related collection of videos. Assisted, automated, and annotated entity relationships will enable discovery of complex spatiotemporal interactions of a rich nature.  Exploiting this deluge of diverse and distributed data is of main stream interest for solving contemporary problems beneficial to society with global impact and is the primary motivating objective. The workshop will aid in developing tools for synthesizing a common operational picture that facilitates situational awareness across domains with associated topics including but not limited to:


  • Georegistration, Stabilization, Mosaicing
  • 3D reconstruction
  • People/Vehicle/Object detection and tracking
  • Video content summarization
  • Activity, Event, Behavior understanding
  • Sensors, Imaging, Optics
  • Distributed sensor calibration
  • Precision timing,  Navigation,  Control


  • Biomedical video summarization/indexing for healthcare monitoring
  • Human machine interface/ collaboration
  • GIS Mapping
  • Classification, tagging, archival
  • Spatiotemporal anomaly detection
  • Security, Safety, Surveillance Video Applications
  • Cloud computing for video analytics
  • Privacy and Security in computer vision

The workshop will provide a unique forum for a number of stakeholders in academia, industry and government working in understanding broad visual content analytics in a semi-automated manner. Specific challenges associated with aerial visual and other sensor data understanding as it pertains to signal noise, instability, bandwidth/computation trade-offs, algorithm approximations, humans-machine interface and collaboration, form-factor issues are of growing interest to a diverse spectrum of stakeholders ranging from engineers, scientists, researchers, application developers, policy makers, and end users who are invited to submit original contributions to the workshop.


·        Final Program

8:00  Opening Remarks

8:15  Invited Talk 1:  Raghuveer Rao  ARL:  Hierarchical Union-of-Subspaces Model for Human Activity Summarization

Dr. Raghuveer Rao Talk Abstract

9:00  Coffee Break

          9:30  Invited Talk 2: Robert Pless, WashU – What do 30000 webcams tell us about how we live?   Abstract

           10:15  Invited Talk 3: Gerard Medioni, USC/Amazon – WAMI and scalable event recognition Abstract

           11:00   Invited Talk 4:  Rich Thissell & Guna, NRL - Scalable architecture for operational FMV Dr. Thissell Talk

         11:45  Lunch Break

           1:15 : Invited Talk 5: Ali Chaudhry, SRI - Comprehensive Human State Modeling and Its Applications Abstract

           14:00  Invited Talk 6: Jiebo Luo - Univ of Rochester – Social Media as Sensors.   Prof. Luo talk

           14:45  Coffee Break

           15:30  Demir & Bozma, Video Summarization via Segments Summary Graphs


           15:50   Yalcin, Cevikalp, Yavuz, Towards Large-Scale Face Recognition Based on Videos Abstract

           16:10  AliAkbarpour, Palaniappan, Seetharaman, Fast Structure from Motion for Sequential  and Wide Area Motion Imagery  Abstract

          16:30 Closing Remarks


·        Important Dates


Submission Deadline:  Sep 28, 2015

Acceptance Decisions:  Oct 10, 2015

Camera Ready Paper Submission:  Oct 15, 2015

·        Organizers

K. Palaniappan (U. Missouri), Sharath Pankanti (IBM Research), Guna Seetharaman (US Naval Research Laboratory) contact: