IoT Big Data Stream Mining Tutorial
Presenters: Gianmarco De Francisci Morales,
Albert Bifet, Latifur Khan, Joao Gama, and Wei
Fan
Summary:
The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in IoT stream mining. This tutorial is a gentle introduction to mining IoT big data streams. The first part introduces data stream learners for classification, regression, clustering, and frequent pattern mining. The second part deals with scalability issues inherent in IoT applications, and discusses how to mine data streams on distributed engines such as Spark, Flink, Storm, and Samza.
Content:
1. IoT Fundamentals
and Stream Mining Algorithms
–
IoT Stream mining setting
–
Concept drift
–
Classification
and Regression
–
Clustering
–
Frequent Pattern mining
– Concept Evolution
– Limited Labeled Learning
2. IoT Distributed
Big Data Stream Mining
–
Distributed Stream Processing Engines
–
Classification
–
Regression
–
Open Source Tools
– Applications
Short
Bio.
Gianmarco
De Francisci Morales 's Profile
Gianmarco De Francisci Morales is a Scientist at QCRI. Previously he worked as a Visiting Scientist at Aalto University in Helsinki, as a Research Scientist at Yahoo Labs in Barcelona, and as a Research Associate at ISTI-CNR in Pisa. He received his Ph.D. in Computer Science and Engineering from the IMT Institute for Advanced Studies of Lucca in 2012. His research focuses on scalable data mining, with an emphasis on Web mining and data-intensive scalable computing systems. He is an active member of the open source community of the Apache Software Foundation, working on the Hadoop ecosystem, and a committer for the Apache Pig project. He is one of the lead developers of Apache SAMOA, an open-source platform for mining big data streams. He commonly serves on the PC of several major conferences in the area of data mining, including WSDM, KDD, CIKM, and WWW. He co-organizes the workshop series on Social News on the Web (SNOW), co-located with the WWW conference.
Albert
Bifet's Profile
Albert Bifet is
Associate Professor at Telecom ParisTech and Honorary Research
Associate at the WEKA Machine Learning Group at University of Waikato.
Previously he worked at Huawei Noah's Ark Lab in Hong Kong, Yahoo Labs
in Barcelona, University of Waikato and UPC BarcelonaTech. He is the
author of a book on Adaptive Stream Mining and Pattern Learning and
Mining from Evolving Data Streams. He is one of the leaders of MOA and
Apache SAMOA software environments for implementing algorithms and
running experiments for online learning from evolving data streams. He
is serving as Co-Chair of the Industrial track of IEEE MDM 2016, ECML
PKDD 2015, and as Co-Chair of BigMine (2015, 2014, 2013, 2012), and ACM
SAC Data Streams Track (2016, 2015, 2014, 2013, 2012).
Latifur Khan's Profile
Latifur Khan is a full Professor (tenured) in the Computer Science department at the University of Texas at Dallas where he has been teaching and conducting research since September 2000. He received his Ph.D. and M.S. degrees in Computer Science from the University of Southern California in August of 2000, and December of 1996 respectively. He has received prestigious awards including the IEEE Technical Achievement Award for Intelligence and Security Informatics. Dr. Khan is an ACM Distinguished Scientist and a Senior Member of IEEE. He has chaired several conferences and serves (or has served) as associate editor on multiple editorial boards including IEEE Transactions on Knowledge and Data Engineering (TKDE) journal. He has conducted tutorial sessions in prominent conferences such as ACM WWW 2005, MIS2005, DASFAA 2007, and WI 2008 ( "Matching Words and Pictures - Problems, Applications, and Progress" ) and PAKDD 2011 ( "Data Stream Mining Challenges and Techniques").
Joao
Gama's Profile
Joao Gama received, in 2000, his Ph.D. degree in Computer Science from the Faculty of Sciences of the University of Porto, Portugal. He joined the Faculty of Economy where he holds the position of Associate Professor. He is also a senior researcher and vice-director of LIAAD, a group belonging to INESC TEC. He has worked in several National and European projects on Incremental and Adaptive learning systems, Ubiquitous Knowledge Discovery, Learning from Massive, and Structured Data, etc. He served as Co-Program chair of ECML'2005, DS'2009, ADMA'2009, IDA' 2011, and ECM-PKDD'2015. He served as track chair on Data Streams with ACM SAC from 2007 till 2016. He organized a series of Workshops on Knowledge Discovery from Data Streams with ECMLPKDD conferences and Knowledge Discovery from Sensor Data with ACM SIGKDD. He is author of several books in Data Mining (in Portuguese) and authored a monograph on Knowledge Discovery from Data Streams. He authored more than 250 peer-reviewed papers in areas related to machine learning, data mining, and data streams. He is a member of the editorial board of international journals ML, DMKD, TKDE, IDA, NGC, and KAIS.a Researcher at LIAAD, University of Porto,
working at the Machine Learning group. His main research interest is in
Learning from Data Streams. He published more than 80 articles. He served as
Co-chair of ECML 2005, DS09, ADMA09 and a series ofWorkshops
on KDDS and Knowledge Discovery from Sensor Data with ACM SIGKDD. He is
serving as Co-Chair of next ECM-PKDD 2015. He is author of a recent book on
Knowledge Discovery from Data Streams.
Wei
Fan's Profile
Wei Fan is
the Head of Baidu Research Big Data Lab. He received his PhD in
Computer Science from Columbia University in 2001. His main research
interests and experiences are in various areas of data mining and
database systems, such as, stream computing, high performance computing,
extremely skewed distribution, cost-sensitive learning, risk analysis,
ensemble methods, easy-touse nonparametric methods, graph mining,
predictive feature discovery, feature selection, sample selection bias,
transfer learning, time series analysis, bioinformatics, social network
analysis, novel applications and commercial data mining systems. His
co-authored paper received ICDM'2006 Best Application Paper Award, he
led the team that used his Random Decision Tree method to win 2008 ICDM
Data Mining Cup Championship. He received 2010 IBM Outstanding Technical
Achievement Award for his contribution to IBM Infosphere Streams. He is
the associate editor of ACM Transaction on Knowledge Discovery and Data
Mining (TKDD). At Huawei, he led his colleagues to develop Huawei
StreamSMART, a streaming platform for online and real-time
processing, query and mining of very fast streaming data. In addition,
he also led his colleagues to develop a real-time processing and
analysis platform of Mobile Broad Band (MBB) data.
|