Infotopo Software

"When you use the word information, you should rather use the word form" R.Thom

"Understanding is Compressing" G.Chaitin - P. Grassberger

Data are ressources, knowledge is energy,

discover, transform and share them freely...

InfoTopo: Topological Information Data Analysis. Deep statistical unsupervised and supervised learning.


InfoTopo is a Machine Learning method based on Information Cohomology, a cohomology of statistical systems [1,8,9].

It allows to estimate higher order statistical structures, dependences and (refined) independences or generalised (possibly non-linear) correlations and to uncover their structure as simplicial complex.

It provides estimations of the basic information functions, entropy, joint and condtional, multivariate Mutual-Informations (MI) and conditional MI, Total Correlations...

InfoTopo is at the cross-road of Topological Data Analysis, Deep Neural Network learning, statistical physics and complex systems:

1. With respect to Topological Data Analysis (TDA), it provides intrinsically probabilistic methods that does not assume metric (Random Variable's alphabets are not necessarilly ordinal) [2,3,6].

2. With respect to Deep Neural Networks (DNN), it provides a simplical complex constrained DNN structure with topologically derived unsupervised and supervised learning rules (forward propagation, differential statistical operators). The neurons are random Variables, the depth of the layers corresponds to the dimensions of the complex [3,4,5].

3. With respect to statistical physics, it provides generalized correlation functions, free and internal energy functions, estimations of the n-body interactions contributions to energy functional, that holds in non-homogeous and finite-discrete case, without mean-field assumptions. Cohomological Complex implements the minimum free-energy principle. Information Topology is rooted in cognitive sciences and computational neurosciences, and generalizes-unifies some consciousness theories [5].

4. With respect to complex systems studies, it generalizes complex networks and Probabilistic graphical models to higher degree-dimension interactions [2,3].


It assumes basically:

1. a classical probability space (here a discrete finite sample space), geometrically formalized as a probability simplex with basic conditionning and Bayes rule and implementing

2. a complex (here simplicial) of random variable with a joint operators

3. a quite generic coboundary operator (Hochschild, Homological algebra with a (left) action of conditional expectation)


The details for the underlying mathematics and methods can be found in the papers:

[1] Vigneaux J., Topology of Statistical Systems. A Cohomological Approach to Information Theory. Ph.D. Thesis, Paris 7 Diderot University, Paris, France, June 2019. PDF

[2] Baudot P., Tapia M., Bennequin, D. , Goaillard J.M., Topological Information Data Analysis. 2019, Entropy, 21(9), 869 PDF

[3] Baudot P., The Poincaré-Shannon Machine: Statistical Physics and Machine Learning aspects of Information Cohomology. 2019, Entropy , 21(9), PDF

[4] Baudot P. , Bernardi M., The Poincaré-Boltzmann Machine: passing the information between disciplines, ENAC Toulouse France. 2019 PDF

[5] Baudot P. , Bernardi M., Information Cohomology methods for learning the statistical structures of data. DS3 Data Science, Ecole Polytechnique 2019 PDF

[6] Tapia M., Baudot P., Dufour M., Formizano-Treziny C., Temporal S., Lasserre M., Kobayashi K., Goaillard J.M.. Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons. Scientific Reports. 2018. PDF

[7] Baudot P., Elements of qualitative cognition: an Information Topology Perspective. Physics of Life Reviews. 2019. extended version on Arxiv. PDF

[8] Baudot P., Bennequin D., The homological nature of entropy. Entropy, 2015, 17, 1-66; doi:10.3390. PDF

[9] Baudot P., Bennequin D., Topological forms of information. AIP conf. Proc., 2015. 1641, 213. PDF

New release:

Documentation installation and Tutorial (read the doc)

github INFOTOPO version 0.1 (new release)

The previous version of the software for Information Topology Data Analysis INFOTOPO : the 2017 scripts available at Github infotopo

The INFOTOPO library is a generic open source suite of Python Programs (compatible with Python 3.4.x, on Linux, windows, or mac) for Information Topological Data Analysis. It is distrubuted freely under opensource GNU GPL V3 Licence and available on Github depository. The library offers state-of-the-art statistical high dimensional data structures analysis and algorithms to detect covarying patterns and clusters, multiscale data analysis, what we called the "Poincaré-Shannon Machine" in tribute to some original works in unsupervised machine learning.

An introduction to "How it works" is given Here. For a more mathematical and precise presentation see Here.

To answer to common questions, this method is quite different from usual topological data analysis and persitence based methods because:

_ first, it is intrinsically based on probability and statistics (in the usual points and/or fuzzy/open sets of a manifold are replaced by probability densities and open covers by random variables, and as a result we study the homology of the statistics), second it does not approximate

_ second, as a consequence it does not rely on metric space assumptions: you can handle random variable taking value as "beautifull" or "ugly" together with variables such as position or mass (...)

_ third, it does not rely on one dimensional Vietoris-Rips complex approximation but rather consider all the combinatorics of the Cech complex (implemented here as the Mutual-information - free energy complex). It is hence very costly computationnaly but exhaustive (currently possible for small data, but we currently develop fast approximations of arbitrary dimensions) .

It computes all multivariate information functions: entropy, joint entropy between k random variables (Hk), mutual informations between k random variables (Ik), conditional entropies and mutual informations and provides their cohomological (and homotopy) visualisation in the form of information landscapes and information paths together with an approximation of the minimum information energy complex [1]. It is applicable on any set of empirical data that is data with several trials-repetitions-essays (parameter m), and also allows to compute the undersampling regime, the degree k above which the sample size m is to small to provide good estimations of the information functions [1]. The computational exploration is restricted to the simplicial sublattice of random variable (all the subsets of k=n random variables) and has hence a complexity in O(2^n). In this simplicial setting we can exhaustively estimate information functions on the simplicial information structure, that is joint-entropy Hk and mutual-informations Ik at all degrees k=<n and for every k-tuple, with a standard commercial personal computer (a laptop with processor Intel Core i7-4910MQ CPU @ 2.90GHz * 8) up to k=n=21 in reasonable time (about 3 hours). The mathematical formalism can be found in [1,2,3,6], and its application as a neuroscience and data analysis method can be found in [1,4,5,6].

[3] Categories and Physics 2011. Classic and quantum Information topos.

[4] Random models in Neuroscience 2012 . Information Topology I and II. PDF

[5] International Conference on Mathematical NeuroScience ICMNS 2015. Poster: Information topology, Neural dynamics and adaptation. PDF

[6] Information Topology: Statistical Physic of Complex Systems and Data Analysis -Topological and geometrical structures of information, CIRM LuminyFrance. 27-1 sept VIDEO-SLIDE

The INFOTOPO library is developed as part of the Channelomics project supported by the European Research Council, developped at UNIS Inserm 1072, and thanks previously to supports and hostings since 2007 of Max Planck Institute for Mathematic in the Sciences (MPI-MIS) and Complex System Instititute Paris-Ile-de-France (ISC-PIF) and Institut de Mathématiques de Jussieu - Paris Rive Gauche (IMJ-PRG)

Thank them!!! And thank you to the researchers who supported and helped this work: D.Bennequin, JP.Nadal, J.Petitot, P.Bourgine, J.Jost, A.Sarti, G.Marrelec, H.Bénali, A.Chenciner, J.Touboul, F.Barbaresco, F.Chavane, A.Mohammad-Djafari.

This version is not yet user friendly, nor with optimised code, and many further functionalities have to be implemented:

If you are interested in using or developping this suite, please do not hesitate to contact me:

pierre.baudot [at] gmail.com