Home


Advanced Topics on Data Stream Mining

Albert Bifet, João Gama, Ricard Gavaldà,
Georg Krempl,
Mykola Pechenizkiy,
Bernhard Pfahringer, Myra Spiliopoulou, Indrė Žliobaitė

ECML PKDD 2012, Bristol, Sept. 24





 



























Nowadays, the quantity of data that is created every two days is estimated to be 5 exabytes. This amount of data is similar to the  amount of data created from the dawn of time up until 2003. Moreover, it was estimated that 2007 was the first year in which it was not possible to store all the data that we are producing. This massive amount of real time streaming data opens new challenging discovery tasks. Some of them are already addressed with mature algorithms, while new challenges emerge, including learning on not one but multiple streams. This tutorial has two parts. The first part gives an introduction to recent advances in algorithmic techniques and tools to cope with challenges on stream mining. The second part discusses state of the art research on mining multiple streams – distributed streams and interdependent relational streams.


Concept drift plays a central role in this tutorial. In the first part, we address it in the context of conventional one-stream mining to set the scene. In the second part, we recapitulate on it after introducing multiple-stream mining, and we also consider machine learning methods that are appropriate for incremental data and slow streams.

NOTICE: This tutorial is longer than the others ECML-PKDD 2012 tutorials.

UPDATE: The schedule is the following: 9:00-10:30 ‘Mining One Stream’; 10:45-12:15 'Mining Multiple Streams’ .

Part I

The first part (9:00 – 10:30), ‘Mining One Stream’, will be presented by Albert Bifet, Ricard Gavaldà, Mykola Pechenizkiy, Bernhard Pfahringer, and Indrė Žliobaitė. 

Outline

  • Introduction to data streams and drifting data
  • Adaptive predictive models
  • Clustering streaming data
  • Pattern Mining on streams
  • Tools for mining data streams

Part II

The second part (10:45 – 12:15), ‘Mining Multiple Streams’ will be presented by João Gama, Myra Spiliopoulou, and Georg Krempl.

Outline

  • Mining distributed streams
  • Mining relational streams
  • Feedback issues in streams under drift

Slides Second Part

Presenters

Albert Bifet. Researcher at Yahoo! Research Barcelona. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. He is one of the core developers of MOA software environment for implementing algorithms and running experiments  for online learning from evolving data streams.

João Gama. Researcher at LIAAD, University of Porto, working at the Machine Learning group. His main research interest is in Learning from Data Streams. He published more than 80 articles. He served as Co-chair of ECML 2005, DS09, ADMA09 and a series of Workshops on KDDS and Knowledge Discovery from Sensor Data with ACM SIGKDD. He is author of a recent book on Knowledge Discovery from Data Streams.

Ricard Gavaldà. Professor at the Department of Software, U. Politècnica de Catalunya – BarcelonaTech. He has published over 70 papers and supervised 7 Ph.D. students. His current research interests are algorithmics of machine learning and data mining, with emphasis on streaming and adaptive methods. He is also working on the use of data mining in autonomic and green computing.

Georg Krempl. Postdoc researcher in the Knowledge Management & Discovery (KMD) lab at the Otto-von-Guericke-University Magdeburg, Germany. Doctorate from University of Graz, Austria. Main research interest is learning on evolving, drifting data. Has given several courses on data mining, statistics and optimization for students from different degrees at Univ. Graz and since 2011 at Univ. Magdeburg.

Mykola Pechenizkiy. Assistant Professor at the Department of Computer Science, Eindhoven University of Technology, the Netherlands. He has broad research interests in data mining and its application to various (adaptive) information systems serving industry, commerse, medicine and education. He has been organizing several workshops and conferences in these areas.

Bernhard Pfahringer. Associate Professor with the Computer Science Department of the University of Waikato. His main research interests are in Machine Learning and Data Mining, especially in efficient algorithms, stream mining, randomization, and applications.

Myra Spiliopoulou. Professor of Information Systems in the Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, and Chair  of the Knowledge Management & Discovery (KMD) lab. Main research interest is mining in evolving systems. PC Co-Chair of ECML PKDD 2006 and NLDB 2008, Tutorials Co-Chair at ICDM 2010, Workshops Co-Chair at ICDM 2011, PC Co-Chair of GfKl 2012 and Demo Track Co-Chair at ECML PKDD 2012.

Indrė Žliobaitė. Lecturer in computational intelligence at Bournemouth University, UK and a research task leader within the INFER.eu project. Her research interests and competences concentrate around online predictive modeling, context awareness and adaptation over time, predictive analytics applications.