ECML/PKDD 2014

ECML/PKDD 2014

Tutorial: The Pervasiveness of Machine Learning in Omics Science.

By Ronnie Alves*, Claude Pasquier and Nicolas Pasquier

* Contact author

Description:

Biology has become an enormously data-rich subject. Data is generated in many flavors and follows particularities of the omics perspective adopted along experimental studies. For instance, genomics is the field of study dealing with genomes and it is mostly associated with the static view (the genes and where they are placed along the genome). The dynamic view is brought from the transcriptomics perspective, so the gene expression and its regulation. Finally, interactomics is usually associated to gene products, proteins, and their interactions. However it could also be seen as a huge graph network with layers of interaction integrating distinct omics perspectives. Omics science applications of unsupervised and/or supervised machine learning (ML) techniques abound in the literature. In this tutorial, we discuss machine learning on omics data, putting the emphasis on (i) mapping and (ii) learning omics patterns. We consider three main omics data: genomics, transcriptomics and interactomics. For each perspective, we first provide, the biological problem, the data mapping (from a biological problem to a machine learning problem), the core ML methods employed and its implementation in the R language.

Conceptual map of ML methods (R packages/functions are highlighted in red) in OMICs data:

Draft outline:

    • Introduction and overview of the omics science

    • Machine learning in genomics data – foundations, methods and applications

    • Machine learning in transcriptomics data – foundations, methods and applications

    • Machine learning in interactomics data – foundations, methods and applications

  • Outlook – summary and future challenges

Target audience:

The target groups are: graduate students with intermediate knowledge in machine learning/data mining; no previous knowledge in omics science is required; research scholars who work on machine learning to support bio-medicine and health care; practitioners that own applications on omics data.

Additional material:

All the slides are available in here.

Authors information:

Ronnie Alves received his PhD degree in Computer Science from the University of Minho, Portugal, in April 2008. While pursuing his PhD studies, he also served as a Visiting Researcher in the Data Mining Group at DAIS Lab in the University of Illinois at Urbana-Champaign, and in the Bioinformatics Research Group at Pablo de Olavide University. He was awarded a Postdoctoral Fellowship from the French National Center for Scientific Research (CNRS) in the Institute of Biology Valrose from 2008 to 2010. He is now a Researcher in the Lab. of Computer Science, Robotics and Microelectronics of Montpellier (LIRMM) and in the Institute of Computational Biology (IBC) at Université Montpellier 2. His current research focuses on data mining, machine learning on scientific data. He is also a member of the special committee on computational biology of the Brazilian computer Society (SBC).

Email: alvesrco@gmail.com

Webpage: https://sites.google.com/site/alvesrco/

Claude Pasquier received the PhD degree in computer science from Nice - Sophia Antipolis University, France. He is researcher in bioinformatics at the National Center for Scientific Research (CNRS). Prior to that, he was with the French National Institute for Research in Computer Science and Control (INRIA) and with Schlumberger, Smart Cards & Terminals division where he conducted research on language-based systems. His current research mainly focuses on data mining, bioinformatics and Knowledge-based systems.

Email: claude.pasquier@gmail.com

Webpage: http://claude.pasquier.net

Nicolas Pasquier received the PhD degree in Computer Science from the University Blaise Pascal of Clermont-Ferrand, France, in January 2000. He received the Junior Researcher Award from the INFORSID association on information systems, databases and decision systems in 2000 for his contribution to the design of efficient association rule extraction methods. He is now a faculty member as assistant professor at the Université Nice Sophia-Antipolis and member of the MinD (Mining Data) research project of the I3S Laboratory of Computer Science, Signals and Systems of Sophia Antipolis (CNRS UMR-6070) in France. His current research interests include bioinformatics, data mining, expert knowledge management, and knowledge discovery in databases.

Email: nicolas.pasquier@unice.fr

Webpage: http://www.i3s.unice.fr/~pasquier/web/