IEEE Big Data 2014

Big Data Stream Mining Tutorial

Presenters: Gianmarco De Francisci Morales, Joao Gama, Albert Bifet, and Wei Fan



The challenge of deriving insights from big data has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. This tutorial is a gentle introduction to mining big data streams. The first part introduces data stream learners for classification, regression, clustering, and frequent pattern mining. The second part discusses data stream mining on distributed engines such as Storm, S4, and Samza.




1. Fundamentals and Stream Mining Algorithms

     Stream mining setting

     Concept drift

      Classification and Regression


     Frequent Pattern mining

2. Distributed Big Data Stream Mining

     Distributed Stream Processing Engines





Short Bio.  


Gianmarco De Francisci Morales 's Profile

Gianmarco De Francisci Morales is a Research Scientist at Yahoo Labs Barcelona. He received his Ph.D. in Computer Science and Engineering from the IMT Institute for Advanced Studies of Lucca in 2012. His research focuses on large scale data mining and big data, with a particular emphasis on web mining and Data Intensive Scalable Computing systems. He is an active member of the open source community of the Apache Software Foundation working on the Hadoop ecosystem, and a committer for the Apache Pig project. He is the co-leader of the SAMOA project, an open-source platform for mining big data streams.        


Joao Gama's Profile

Joao Gama is a Researcher at LIAAD, University of Porto, working at the Machine Learning group. His main research interest is in Learning from Data Streams. He published more than 80 articles. He served as Co-chair of ECML 2005, DS09, ADMA09 and a series ofWorkshops on KDDS and Knowledge Discovery from Sensor Data with ACM SIGKDD. He is serving as Co-Chair of next ECM-PKDD 2015. He is author of a recent book on Knowledge Discovery from Data Streams.           


Albert Bifet's Profile

Albert Bifet is a Research Scientist at Huawei. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. He is one of the leaders of MOA and SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams.          


Wei Fan's Profile   

    Wei Fan is the associate director of Huawei Noah’s Ark Lab. His main research  interests
    and experiences are in various areas of data mining and database systems, such as,
   stream computing, high performance computing, extremely skewed distribution, cost-
   sensitive learning, risk analysis, ensemble methods, easy-touse nonparametric methods,
   graph mining, predictive feature discovery, feature selection, sample selection bias,
   transfer learning, time series analysis, bioinformatics, social network analysis, novel
  applications and commercial data mining systems. His co-authored paper received
  ICDM’2006 Best Application Paper Award, he led the team that used his Random Decision
  Tree method to win 2008 ICDM Data Mining Cup Championship. He received 2010 IBM
  Outstanding Technical Achievement Award for his contribution to IBM Infosphere Streams.
  He is the associate editor of ACM Transaction on Knowledge Discovery and Data Mining
  (TKDD). Since he joined Huawei in August 2012, he has led his colleagues to develop Huawei
  StreamSMART – a streaming platform for online and real-time processing, query and
   mining of very fast streaming data. In addition, he also led his colleagues to develop a
  real-time processing and analysis platform of Mobile Broad Band (MBB) data.

Albert Bifet,
Sep 24, 2015, 9:12 AM