WMD 2007

The Workshop on Massive Datasets

was held at the International Conference on Multimodal Interfaces

November 15, 2007

Nagoya, Japan

Executive Summary

This workshop was intended to encourage work and discussion on the topic of analysis, visualization, and manipulation of long-term data sets recorded from large-scale sensor networks. Specifically, this inaugural workshop focused on a dataset released by MERL comprised of one year of data from 215 motion sensors installed in the public spaces of our 3000 square meter research facility.

A few days before the workshop, Dr. Yuri A. Ivanov presented the ICMI keynote address titled "Interfacing Life: A year in the life of a research lab".


We would like to thank those who contributed documentation of the workshop in the form of position papers. We are also publishing a short introduction to the dataset. Electronic proceedings can be found in the ACM Digital Library.


We would like to thank all the guest speakers and the contributors for their interesting presentations. The conversation among the attendees during questions and breaks was lively and intelligent.


Are the tools we use to understand our data scalable to the tens of millions of records, huge spans of time, minute details of behavior, and large geographic extent that future sensor networks will generate? In the future buildings will be studded with sensors. Every movement will generate a few bits of data. Every fluctuation in temperature will be recorded. Every deviation in lighting will be noticed. These large and complex datasets will challenge the tools we use today.

Looking into the future of residential and office buildings Mitsubishi Electric Research Labs (MERL) has been collecting motion sensor data from a network of over 200 sensors for a year. The data is the residual trace of year in the life of a research laboratory. It contains interesting spatio-temporal structure ranging all the way from the seconds of individuals walking down hallways, the minutes in lobbies chatting with colleagues, the hours of dozens of people attending talks and meetings, the days and weeks that drive the patterns of life, to the months and seasons with their ebb and flow of visiting employees.

The dataset contains well over 30 million raw motion records, spanning a calendar year and two floors of our research laboratory. As such it presents a significant challenge for behavior analysis, search, manipulation and visualization of the data. We have also prepared accompanying analytics such as partial tracks and behavior detections, as well as map data and anonymous calendar data marking the pattern of meetings, vacations and holidays.

MERL has released this data set to the community. We invite you to download the data and apply your analytic, visualization, and interface tools. The goal of the workshop is to understand the state of the art in the context of the huge, detailed dataset of the near future.


Christopher R. Wren

Yuri A. Ivanov

(Mitsubishi Electric Research Laboratories)

Program Committee

Many thanks are owed to those who agreed to serve on the committee:

Kiyoharu Aizawa (University of Tokyo)

Aaron Bobick (Georgia Institute of Technology)

Trevor Darrell (Massachusetts Institute of Technology)

Irfan Essa (Georgia Institute of Technology)

Minkyong Kim (IBM Research)

David Minnen (Georgia Institute of Technology)

Vladimir Pavlovic (Rutgers University)

Thad Starner (Georgia Institute of Technology)

Kazuhiko Sumi (Mitsubishi Electric)

Andrew Wilson (Microsoft Research)