https://bintray.com/clustering4ever/C4E/clustering4ever/_latestVersion


Its Big Data Clustering Library (API) gathering clustering algorithms and quality indexes in Scala and Spark/Scala. Don't hesitate to ask questions or make recommendations in our Gitter. It is also in SparkPackages.
Some examples using the C4E API  are a avalaible here
New : Deep Embedded Self-Organizing Map (DESOM) model,(Unsupervised Deep Learning) Github

Publications

2019

  • Gaël Beck, Tarn Duong, Mustapha Lebbah, Hanane Azzag, Christophe Cérin. A distributed approximate nearest neighbors algorithm for efficient large scale mean shift clustering. Journal of Parallel and Distributed Computing, 2019. https://doi.org/10.1016/j.jpdc.2019.07.015
  • Florent Forest, Mustapha Lebbah, Hanene Azzag, and Jérôme Lacaille. Deep Embedded SOM: joint representation learning and self-organization. The 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). Bruges, Belgium from 24 to 26 April 2019.
  • Florent Forest, Mustapha Lebbah, Hanene Azzag and Jérôme Lacaille. Deep Architectures for Joint Clustering and Visualization with Self-Organizing Maps. LDRC@PAKDD 2019 (Learning Data Representation for Clustering@PAKDD) Macau China. April 14-17, 2019.
  • Dina Faneva Andriantsiory, Mustapha Lebbah, Hanane Azzag and Gael Beck. Algorithms for an Efficient Tensor Biclustering. LDRC@PAKDD 2019 (Learning Data Representation for Clustering@PAKDD) Macau China. April 14-17, 2019. arXiv
  • Gaël Beck, Tarn Duong, Mustapha Lebbah, Hanane Azzag, Christophe Cérin. A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering. arXiv 
  • Gaël Beck, Tarn Duong, Mustapha Lebbah, Hanane Azzag. Nearest Neighbor Median Shift Clustering for Binary Data. arXiv
     

2018

  • Florent Forest, Jérôme Lacaille, Mustapha Lebbah, Hanene Azzag: A Generic and Scalable Pipeline for Large-Scale Analytics of Continuous Aircraft Engine Data. IEEE International Conference on BigData 2018: 1918-1924
  • Mohammed Ghesmoune, Mustapha Lebbah, Hanane Azzag, Salima Benbernou, Mourad Ouziri, Tarn Duong:A Complete Data Science Work-flow For Insurance Field. IEEE International Conference on Big Data 2018: 1925-1930
  • Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Hanene Azzag, Mustapha Lebbah: A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing.  IEEE International Conference on Big Data  2018: 2597-2606
  • Leila Abidi, Hanane Azzag, Salima Benbernou, Mehdi Bentounsi, Christophe Cérin, Tarn Duong, Philippe Garteiser, Mustapha Lebbah, Mourad Ouziri, Soror Sahri, and Michel Smadja. A Big Data Platform for Enhancing Life Imaging Activities. Book Chapter  in Utilizing Big Data Paradigms for Business Intelligence. 2018, IGI Global.
  • Nhat-Quang Doan, Hanane Azzag and Mustapha Lebbah. Hierarchical. Laplacian Score for unsupervised feature selection.  IEEE World Congress on Computational Intelligence, IEEE International Joint Conference on Neural Network (IEEE IJCNN), 8-13 July 2018, Rio de Janeiro, Brazil.
  • Beck Gaël, Hanane Azzag, Stephanie Bougeard, Ndèye Niang and Mustapha Lebbah. A New Micro-Batch Approach for Partial Least Square Clusterwise Regression. The  3rd INNS Conference on Big Data and Deep Learning (INNS BDDL), April 17 – 19, 2018, Sanur, Bali, Indonesia

2017

  • Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah: A distributed rough set theory based algorithm for an efficient big data pre-processing under the spark framework. BigData 2017: 911-916
  • Christophe Cérin, Jean-Luc Gaudiot, Mustapha Lebbah, Fouste Yuehgoh: Return of experience on the mean-shift clustering for heterogeneous architecture use case. BigData 2017: 3499-3507
  • Hippolyte Leger, Dominique Bouthinon, Mustapha Lebbah, Hanene Azzag. Nouveau modèle pour un passage à l'échelle de la θ-subsomption. In EGC 2017 , vol. RNTI-E-33, pp.339-344.

2016


  • Tarn Duong, Gael  Beck, Hanene Azzag, Mustapha Lebbah. Nearest neighbour estimators of density derivatives, with application to mean shift clustering. Pattern Recognition Letters. doi = http://dx.doi.org/10.1016/j.patrec.2016.06.021
  • (poster) Salima Benbernou, Mehdi Bentounsi, Pierre Bourdoncle, Mustapha Lebbah, Mourad Ouziri, et al.. Towards Big Data in Medical Imaging. Symposium IDV - Imageries du Vivant, Jan 2016, Cap Hornu, France.

2015


  • Mohammed Ghesmoune, Mustapha Lebbah, Hanene Azzag. Micro-Batching Growing Neural Gas for Clustering Data Streams using Spark Streaming. Procedia Computer Science journal  (2015) pp. 158-166. Doi 10.1016/j.procs.2015.07.290. Paper presented at  INNS Conference on Big Data, 8-10 August 2015 – San Francisco, USA) PDF
  • Nhat-Quang Doan, Mohammed Ghesmoune, Hanane Azzag, and Mustapha Lebbah - Growing Hierarchical Trees for Data Stream Clustering and Visualization - International Joint Conference on Neural Networks (IJCNN 2015), July 12–17, 2015, Killarney, Ireland. PDF. DOI:10.1109/IJCNN.2015.7280397
  • Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag. Clustering over data streams based on growing neural gas. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining. PAKDD (2) 2015: 134-145. PDF
  • Nesrine Masmoudi, Hanane Azzag, Mustapha Lebbah, Cyrille Bertelle and Maher Ben Jemaa.  How to Use Ants for Data Stream Clustering. IEEE Congress on Evolutionary Computation (IEEE CEC 2015)


  • Mohammed Ghesmoune, Mustapha Lebbah, Hanane Azzag. Clustering topologique pour le flux de données. In EGC 2015, vol. RNTI-E-28, pp.137-142
  • Tugdual Sarazin, Hanane Azzag, Mustapha Lebbah. Modèle de Biclustering dans un paradigme "Mapreduce". In EGC 2015, vol. RNTI-E-28, pp.467-468


2014

  • Mohammed Ghesmoune, Hanene Azzag, and Mustapha Lebbah. G-stream : Growing neural gas over data stream. In Neural Information Processing - 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Proceedings, Part I, volume 8834 of Lecture Notes in Computer Science, pages 207–214. Springer, 2014.
  • Tugdual Sarazin, Mustapha Lebbah, and Hanane Azzag. Biclustering using spark- mapreduce. In 2014 IEEE International Conference on Big Data, Big Data 2014, Wa- shington, DC, USA, October 27-30, 2014, pages 58–60, 2014.
  • Tugdual Sarazin, Mustapha Lebbah, Hanane Azzag, and Amine Chaibi. Feature group weighting and topological biclustering. In Neural Information Processing - 21st Interna- tional Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Proceedings, Part II, volume 8835 of Lecture Notes in Computer Science, pages 369–376. Springer, 2014.
  • Tugdual Sarazin, Hanane Azzag, and Mustapha Lebbah. 2014. SOM Clustering Using Spark-MapReduce. In Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW '14). IEEE Computer Society, Washington, DC, USA, 1727-1734. DOI=10.1109/IPDPSW.2014.192
  • Salima Benbernou and Mustapha Lebbah. Workflows and scientific big data preservation. In https ://martwiki.in2p3.fr/twiki/pub/PREDON/WebHome/PREDON-VECTOBD. pdf Cristinel Diaconu, editor, scientific data preservation, pages 38–41. 2014.
  •  Christophe Cérin, Mustapha Lebbah, and Hanane Azzag. Cloud and grid methodologies for data management and preservation. In
    https ://martwiki.in2p3.fr/twiki/pub/PREDON/WebHome/PREDON-VECTO-BD.pdf Cristinel Diaconu, editor, Scientific data preservation, pages 49–54. 2014.
  • Amine Chaibi,  Hanane Azzag Mustapha Lebbah .Pondération de blocs de variables en bi-partitionnement topologique.  In Proc. of the EGC'14. Du 28 au 31 Janvier 2014 à Rennes.  RNTI, Revue des Nouvelles Technologies de l'Information, Editions Hermann

2013



  • Nhat-Quang Doan, Hanane Azzag, and Mustapha Lebbah. Growing Self-organizing Trees for Autonomous Hierarchical Clustering,  Neural Networks. Special Issue on Autonomous Learning. Volume 41, May 2013, Pages 85–95. Elsevier.
  • Nesrine Masmoudi, Hanane Azzag, Mustapha Lebbah and Cyrille Bertelle. Clustering using chemical and colonial odors of real ants. The Fifth World Congress on Nature and Biologically Inspired Computing (NaBIC2013) to be held in Fargo, USA  August 12-14, 2013.
  • Amine Chaibi, Mustapha Lebbah and Hanane Azzag. Group Outlier Factor : a new score using Self-Organising Map for Group-Outlier and Novelty Detection, International Journal of Computational Intelligence and Applications (IJCIA), World Scientific Publishing Company, 2013. DOI: 10.1142/S1469026813500107.http://dx.doi.org/10.1142/S1469026813500107
  • Amine Chaibi, Mustapha Lebbah, Hanane Azzag: A new bi-clustering approach using topological maps. IJCNN 2013: 1-7. August 4–9, 2013 Dallas
  • Hanane Azzag, Mustapha Lebbah: A New Way for Hierarchical and Topological Clustering. Advances in Knowledge Discovery and Management 2013 Vol 3: 85-97

2012


  • Nhat-Quang Doan, Hanane Azzag and Mustapha Lebbah. Growing Self-organizing Trees for Knowledge Discovery from Data. IEEE World Congress on Computational Intelligence (IEEE WCCI 2012). International Joint Conference on Neural Networks (IJCNN 2012). pages 251 - 258. June 10-15, 2012. Brisbane. Australia


2011


  • JAZIRI R, LEBBAH M.,ROGOVSCHI N.,  BENNANI Y (2011).  «Probabilistic  Self-Organizing Maps for Multivariate Sequences», in Proc. IJCNN'2011, IEEE International Joint Conference on Neural Network, pages 851-858, San Jose, California-July 31 - August 5, 2011. [PDF]

  • JAZIRI R., LEBBAH M., BENNANI Y., CHENOT J.H. (2011), “Apprentissage non supervisé des structures des HMMs”, in Proc. SFDS, 43éme Journées de Statistiques, Gammarth, Tunisie, 23-27 Mai 2011. 

  • ROGOVSCHI N, LEBBAH M., BENNANI. Modèles de mélanges topologiques pour la classification de données catégorielles et mixtesNuméro spécial : Numéro spécial: Fouille de données complexes - Complexité liée aux données multiples, RNTI 2011-E-21-(Revue des Nouvelles Technologies de l'Information). p 53--80. Editions Hermann

2010

  • M. Lebbah and H. Azzag, “Topological hierarchical tree using artificial ants,” in Neural Information Processing. Theory and Algorithms, ser. Lecture  Notes in Computer Science, K. Wong, B. Mendis, and A. Bouzerdoum, Eds. Springer Berlin / Heidelberg, 2010, vol. 6443, pp. 652–659, 10.1007/978-3-642-17537-47 9
  • HAMDI F., LEBBAH M., BENNANI Y. (2010),  «Topographic Under-Sampling for Unbalanced Distributions». IJCNN‘10, International Joint Conference on Neural Network. IEEE World Congress on Computational Intelligence, pages 18-23 18-23 July 2010, Barcelona, Spain.
  • H. Azzag, M. Lebbah, A. Arfaoui. Map-TreeMaps : A new approach for hierarchical and topological clustering. Machine Learning and Applications. IEEE-ICMLA 2010: The Ninth International Conference, pages 873--878. Washington DC, USA,  December 12-14, 2010.
  • ROGOVSCHI N., LEBBAH M., BENNANI Y., . (2010),  «Learning Self-Organizing Mixture Markov Models». Journal of Nonlinear Systems and Applications (JNSA), ISSN 1918-3704,.  Pages 63-71. Volume 1, Number 1-2 Published by watam press, Canada.
  • GROZAVU N., BENNANI Y., LEBBAH M. (2010),  «Cluster-dependent features selection through a weighting learning paradigm», (eds) Advances in Knowledge Discovery and Management, Series: Studies in Computational Intelligence, Berlin: Springer. Vol. 292, 2010, Springer. ISBN: 978-3-642-00579-4, DOI: 10.1007/978-3-642-00580-0.
  • LEBBAH M., BENABDESLEM K. Visualization and Clustering of Categorical Data with Probabilistic Self-Organizing Map. Neural Computing and Aplications journal. DOI 10.1007/s00521-009-0299-2.  19(1) 2010.
  • Nicoleta Rogovschi, Mustapha Lebbah, Younès Bennani "A weighted Self-Organizing Map for mixed continuous and categorical data", page.47-52. The 5th International Conference on Neural Network and Artificial Intelligence (ICNNAI'2010), 1 - 4 June, 2010, Brest State Technical University, Brest, Belarus.


  • H. Azzag, M. Lebbah. CLassification topologique et points d’intérêt, XVIIe Rencontre de la Société Francophone de Classification (SFC'2010), p 37-40  Juin 2010.
  • Hanane Azzag et Mustapha Lebbah. Auto-organisation topologique et hiérarchique des données. Conférence Internationale Francophone sur l'Extraction et la Gestion des Connaissances. EGC’10.  Hammamat-Tunisie,  27- 30 Janvier 2010. RNTI, Revue des Nouvelles Technologies de l'Information, p 555-560. Editions Cépaduès

2009



  • AZZAG H., LEBBAH M. « A new approach for auto-organizing a groups of artificial ants », in Proc Springer LNCS (Lecture Notes in Computer Science) of the ECAL'2009, 10th European Conference on Artificial Life. Budapest septembre 13-16, 2009.
  • GROZAVU N., BENNANI Y., LEBBAH M. (2009),  «From variable weighting to cluster characterization in topographic unsupervised learning», in Proc. IJCNN ‘09, International Joint Conference on Neural Network. p1005 - 1010.  14-19 June 2009, Atlanta, Georgia.
  • LEBBAH M., BENNANI Y., BENHADDA H., GROZAVU N. Relational Analysis for Clustering Consensus. Invited Book Chapter, is accepted for publishing in the book "Machine Learning", ISBN ISBN 978-953-7619-X-X. IN-TECH Publisher, 2009


  • ROGOVSCHI N., LEBBAH M., BENNANI Y. (2009),  «Un algorithme pour la classification topographique simultanée de données qualitatives et quantitatives», CAp’09 : Conférence francophone sur l'apprentissage automatique, p209-224. Plate-forme AFIA, 25-29 Mai, Hammamat-Tunisie.
  • GROZAVU N., BENNANI Y., LEBBAH M. (2009),  «Caractérisation automatique des classes découvertes en classification non supervisée», Proc. of the EGC'09, p 43-54 Strasbourg, 27- 30 Janvier 2009 – RNTI, Revue des Nouvelles Technologies de l'Information, Editions Cépaduès.


2008


  • Mustapha Lebbah, Younès Bennani, Nicoleta Rogovschi: A Probabilistic Self-Organizing Map for Binary Data Topographic Clustering. International Journal of Computational Intelligence and Applications 7(4): 363-383 (2008)
  • Mustapha Lebbah, Younes Bennani and Hamid.BENHADDA. Relational Analysis for  Consensus Clustering from Multiple Partitions. Machine Learning and Applications. ICMLA 2008: Seventh International Conference. pp 218- 223. San Diego, California,  December 11-13, 2008.
  •  Nicoleta Rogovschi, Mustapha Lebbah, and Younes Bennani. Probabilistic Mixed Topological Map for Categorical and Continuous Data.  Machine Learning and Applications. ICMLA 2008: Seventh International Conference. pp 224-231. San Diego, California,  December 11-13, 2008.
  • AZZAG H., LEBBAH M.Clustering of Self-Organizing Map. ESANN'2008 proceedings - European Symposium on Artificial Neural Networks. Page 209-214. Bruges (Belgium), 23-25 April 2008.
  • Julien Brajard, Cédric Duboudin, Hanane Bénaribi, Mustapha Lebbah et Sylvie Thiria. Typologie des logements et lien avec la multipollution. Colloque "Comment concilier énergie, qualité de l’air intérieur et santé". Pendant le salon Pollutec. 4 DECEMBRE 2008. [PDF]


  • BADRAN F,LEBBAH M, THIRIA S. Chapitre 7: Cartes auto-organisatrices et classification automatique. Livre Apprentissage statistique. Editeur(s) : Eyrolles. Oct 2008.
  • LEBBAH M, AZZAG H. Segmentation hiérarchique des cartes topologiques. 8ème journées francophones :Extraction et gestion des Connaissances, EGC’08, Nice, Janvier 2008. (Revue des Nouvelles Technologies de l'Information),
  • Nistor GROZAVU, Younès BENNANI, Mustapha Lebbah. Pondération locale des variables en apprentissage numérique non-supervisé. 8ème journées francophones :Extraction et gestion des Connaissances, EGC’08, Nice, Janvier 2008. (Revue des Nouvelles Technologies de l'Information)
  • BENNANI Y., LEBBAH M., GROZAVU N., MARCOTORCHINO J.F., BENHADDA H., LORIN S. (2008), «Analyse relationnelle comme algorithme de fusion de partitionnement», Workshop Infomagic, Analyse multimodale de l'information, Pôle de compétitivité Cap digital, 10 juin 2008 Telecom-ParisTech.

2007



  • Mustapha Lebbah, Nicoleta Rogovschi and Younes Bennani. BeSOM : Bernoulli on Self Organizing Map. International Joint Conference on Neural Networks, IJCNN 2007, Celebrating 20 years of neural networks, Orlando, Florida, USA, August 12-17, 2007. page 631-636 IJCNN 2007-August 12-17, 2007, Orlando, Florida.[pdf]
  • Khalid Benabdeslem and Mustapha Lebbah. Feature selection for Self Organizing Map. International Conference on Information Technology Interface-ITI 2007. June 25-28, 2007, p 45-50, Cavtat-Dubrovnik,Croatia.
  • Arnaud Quesney, Eric Jeansou, Christian Ruiz, Nathalie Steunou, Bruno Cugny, Nicolas Picot, Jean-Claude Souyris, Sylvie Thiria, Mustapha Lebbah. Unsupervised Classification Of Altimetric Waveform Over All Surface Type. Ocean Surface Topography Science Team Meeting. OSTST 2007.[quesney_SL-4543.pdf]
  • Khalid Benabdeslem, Mustapha Lebbah, Alexandre Aussem, Nadjim Chelghoum, Marylis Corbex. Learning based system for knowledge discovery from Nasopharyngeal cancer data. Colloque sur l'Optimisation et les Systèmes d'Information COSI 2007. Pages 533-542. 11-13 juin 2007 - Oran Algérie.
  • AZZAG H., BENNANI Y., LEBBAH M. «A Stochastic algorithm for clustering inspired from cockroaches behavior», Proceedings of the ICMCS'07, September 19-21, 2007, Chisinau, Moldavie.


  • Mustapha Lebbah, Mohamed Ramzi Temanni, Christine Poitou-Bernert, Karine Clement, Jean-Daniel Zucker. Partionnement des données pour les problèmes de classement difficiles:Combinaison des cartes topologiques mixtes et SVM. Numéro spécial Apprentissage Artificiel et Fouille de Données, RNTI 2007-(Revue des Nouvelles Technologies de l'Information). p34-54.
  • LEBBAH M., ROGOVSCHI N., BENNANI Y. (2007), « BeSOM : Bernoulli on Self-Organizing Map», Cap’2007 : conférence francophone sur l'apprentissage automatique, Plate-forme AFIA, 2-6 juillet, Grenoble. Papier nominé.
  • Mohamed Ramzi Temanni,Mustapha Lebbah,Christine Poitou-Bernert, Karine Clement,Jean-Daniel Zucker, Combinaison des cartes topologiques mixtes et des machines à vecteurs de support pour la prédiction de perte de poids chez les obèses. Extraction et Gestion des Connaissances (EGC 2007), Namur. RNTI-E-9 Cépaduès-Éditions 2007 Namur, Volume I, p 33-44. (Revue des Nouvelles Technologies de l'Information),
  • K. Benabdeslem, M. Lebbah, A. Aussem et M. Corbex. Approche connexionniste pour l’extraction de profils cas-témoins du cancer du Nasopharynx à partir des données issues d’une étude épidémiologique. Extraction et gestion des Connaissances, EGC’07, pp 445 - 454, Namur, Janvier 2007.(Revue des Nouvelles Technologies de l'Information),

2006


  • SARACENO M, PROVOST C, LEBBAH M. Biophysical regions identification using an artificial neuronal network: a case study in the South Western Atlantic. Advances in Space Research. Volume 37, Issue 4, Natural Hazards and Oceanographic Processes from Satellite Data, 2006, Pages 793-805.[pdf]
  • LEBBAH M, THIRIA S, BADRAN F. Les perceptrons multi-couches. paru fin 2006 aux Editions Hermès.[pdf]
  • Mohamed Ramzi Temanni, Mustapha Lebbah, Christine Poitou-Bernert, Karine Clement, Jean-Daniel Zucker. Combining mixed topological maps and SVM to improve accuracy of hard classification problems: application to the biomedical data.IPG 2006-(Integrative Post-Genomics).[pdf]

2005


    • LEBBAH M, CHAZOTTES A, THIRIA S, BADRAN F. Mixed Topological Map. ESANN 2005 proceedings-European Symposium on Artificial Neural Networks. Bruges, April 26-29. p 357-362.[pdf]
    • SARACENO M, PROVOST C, LEBBAH M, Piola A.R. Biophysical regions in the South Western Atlantic as seen by remote sensing data using artificial neuronal network: a case study. EGU (European Geosciences Union ) 2005.


    • LEBBAH M, THIRIA S, BADRAN F. Visualisation et classification avec les cartes topologiques catégorielles. Revue des Nouvelles Technologies de l'Information, Cépaduès RNTI-E-4. Numéro spécial sur la fouille de données complexes. Novembre 2005.
    • LEBBAH M, THIRIA S, BADRAN F. Carte topologiques mixtes. COURS ET ATELIERS n°9, Fouille de données complexes dans un processus d’extraction de connaissances. EGC’2005.

    2004


    • LEBBAH M, THIRIA S, BADRAN F. Visualisation avec les cartes topologiques catégorielle. COURS ET ATELIERS n°6, Fouille de données complexes dans un processus d’extraction de connaissances. EGC’2004.

    2000-2003



    • LEBBAH M, THIRIA S, BADRAN F, CHABANON C, Categorical Topological map, IEEE International Conference on Artificial Neural Network ( ICANN 2002), Madrid 2002, Proceedings. pp 890-895.Volume 2415/2002, Lecture Notes in Computer Science, LNCS [pdf]
    • Carte topologique pour données qualitatives: application à la reconnaissance automatique de la densité du trafic routier. Mémoire de thèse de doctorat. Université de Versailles Saint-Quentin-en-yvelines. Mai 2003.[pdf]
    • LEBBAH M, THIRIA S, BADRAN F. Carte topologique et données binaires. 32èmes Journées, Mai 16..25/ 2000, société française des statistiques, Fes 2000.