Overview

Information theory provides a mathematical framework allowing to formulate and quantify the basic limitations of data compression and communication [1]. The notions of data compression and communication, based in analog and digital communication, are also relevant to other domains; as such, information theory spans a number of research fields. Aiming to formulate, understand, and quantify the storage and processing of information is a thread that ties together these disparate fields, and especially the study of cognition in humans and machines [2–7].

 

Specifically, the desire to reach an integrative computational theory of human and artificial cognition, is attempted by leveraging information-theoretic principles as bridges between various cognitive functions and neural representations. Insights from information theoretic formalization have also led to tangible outcomes which have influenced the operation of artificial intelligent systems. One example is the information bottleneck (IB) approach [8], yielding insights on learning in neural networks (NN) [9,10], as well as tools for slow feature analysis [11] and speech recognition [12,13]. A central application of the IB approach on NN, is through the view of data transfer between layers as an autoencoder [14]. The approach then uses a variational approximation of the IB to produce an objective for minimization that is feasible and results in efficient training (a.k.a. variational IB (VIB)) [15]. In the other direction, the variational autoencoder (VAE) framework has also been used to explain cognitive functions, as done for example in [16]. The IB approach has also been applied to emergent communication (EC) in both humans and machines, using a vector quantization VIB (VQ-VIB) method, that extends the aforementioned VIB method [17,18]. Another example is the trade-off between information and value in the context of sequential decision making [19]. This corresponding formalism has led to tangible methods in the solution of sequential decision-making problems [20–22] and was even used in an experimental study of mouse navigation [23], study of drivers' eye gaze patterns [24–26] and study of drivers' language models [27].

 

In aiming at understanding machine learning (ML), specifically in the context of NNs, or cognition, we need theoretical principles (hypotheses) that can be tested. To quote Shannon [28]: I personally believe that many of the concepts of information theory will prove useful in these other fields-and, indeed, some results are already quite promising-but the establishing of such applications is not a trivial matter of translating words to a new domain, but rather the slow tedious process of hypothesis and experimental verification. If, for example, the human being acts in some situations like an ideal decoder, this is an experimental and not a mathematical fact, and as such must be tested under a wide variety of experimental situations. Today, both ML and cognition can entertain huge amounts of data. Establishing quantitative theories and corresponding methods for computation can have a massive impact on progress in these fields.


Broadly, this workshop aims to further the understanding of information flow in cognitive processes and neural networks models of cognition. More concretely, this year’s workshop goals are twofold. On the one hand we wish to provide a fruitful platform for discussions relating to formulations of storage and processing of information either in human or artificial cognition systems, via information-theoretic measures, as those formalisms mentioned above. Specifically, the workshop comes to allow information theory researchers to take part in such discussions, allowing first-hand sharing of knowledge and ideas. On the other hand, we hope this workshop can advance, sharpen and enhance the research done around the computation of information theoretic quantities, specifically for the needs and benefits of cognition research. The two aims of the workshop are not independent of one another - any information theoretic formalism that we wish to experimentally verify has to be, in some sense, computationally feasible. Moreover, we wish that computation and estimation methods are developed in a way that is tailored to the open questions in human and artificial cognition.

 

The workshop focuses on bringing together researchers interested in integrating information-theoretic approaches with researchers focused on the computation/estimation of information-theoretic quantities, with the aim of tightening the collaboration between the two communities. Researchers interested in integrating information-theoretic approaches come from cognitive science, neuroscience, linguistics, economics, and beyond. Efforts in the computation/estimation of information-theoretic quantities are pursued for many reasons, and is a line of research gaining increasing attention due to advances in ML. Furthermore, these researchers have created in recent years new methods to measure information-related quantities (e.g., [29]).


Finally, and more specifically, some questions that are in the scope of this workshop are:

• To what extent can information theoretic principles advance the understanding of human cognition and its emergence from neural systems?

• To what extent can the novel methods of estimation of information theoretic quantities be applied to support the understanding of human and artificial cognition?

• What are the gaps between computational capabilities and the ability to validate and expand information-theoretic formalisms in cognition research?

• Considering, specifically, NN, can information-theoretic concepts shed light on their operation?

• What are the effective methods of computing the necessary related information theoretic quantities within a NN and can they be applied to large and deep NN?

• What are the key challenges for future research harnessing information theory towards broadening the understanding of human and artificial cognition? 

[1] T. M. Cover and J. A. Thomas, Elements in Information Theory. New York: Wiley, 2006, second Edition.
[2] H. B. Barlow, “Possible principles underlying the transformation of sensory messages,” Sensory communication, vol. 1, no. 01, 1961.
[3] S. E. Palmer, O. Marre, M. J. Berry, and W. Bialek, “Predictive information in a sensory population,” Proceedings of the National Academy of Sciences, vol. 112, no. 22, p. 6908–6913, 2015.
[4] G. Tkacik and W. Bialek, “Information processing in living systems,” Annual Review of Condensed Matter Physics, vol. 7, no. 1, p. 89–117, 2016.
[5] K. Friston, “The free-energy principle: A unified brain theory?” Nature reviews neuroscience, vol. 11, no. 2, p. 127–138, 2010.
[6] ——, “Functional and effective connectivity: a review,” Brain Connect, vol. 1, p. 13–36, 2011.
[7] N. M. Timme and C. Lapish, “A tutorial for information theory in neuroscience,” eNeuro, vol. 5, no. 3, p. 89–117, 2018.
[8] N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.
[9] R. Shwartz-Ziv and N. Tishby, “Opening the black box of deep neural networks via information,” ArXiv, vol. abs/1703.00810, 2017.
[10] N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in Proc. of the IEEE Information Theory Workshop (ITW), 26 April-1 May 2015, Jerusalem, Israel, 2015.
[11] R. Turner and M. Sahani, “A Maximum-Likelihood Interpretation for Slow Feature Analysis,” Neural Computation, vol. 19, no. 4, pp. 1022–1038, 04 2007. [Online]. Available: https://doi.org/10.1162/neco.2007.19.4.1022
[12] R. M. Hecht, E. Noor, and N. Tishby, “Speaker recognition by gaussian information bottleneck,” in Tenth Annual Conference of the International Speech Communication Association (Interspeech), 09 2009, pp. 1567–1570.
[13] R. M. Hecht, E. Noor, G. Dobry, Y. Zigel, A. Bar-Hillel, and N. Tishby, “Effective model representation by information bottleneck principle,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 8, pp. 1755–1759, 2013.
[14] D. Kingma and M. Welling, “Auto-encoding variational bayes,” ICLR, 12 2013.
[15] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in International Conference on Learning Representations, 2017. [Online].
Available: https://openreview.net/forum?id=HyxQzBceg
[16] G. Aridor, R. A. da Silveira, and M. Woodford, “Information-constrained coordination of economic behavior,” 2023.
[17] N. Zaslavsky, C. Kemp, T. Regier, and N. Tishby, “Efficient compression in color naming and its evolution,” Proceedings of the National Academy of Sciences, vol. 115, no. 31, pp. 7937–7942, 2018. [Online]. Available: https://www.pnas.org/doi/abs/10.1073/pnas.1800521115
[18] M. Tucker, R. P. Levy, J. Shah, and N. Zaslavsky, “Trading off utility, informativeness, and complexity in emergent communication,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=O5arhQvBdH

[19] N. Tishby and D. Polani, “Information theory of decisions and actions,” in Perception-Action Cycle. Springer Series in Cognitive and Neural Systems., T. J. Cutsuridis V., Hussain A., Ed. Springer, New York, NY., 2011, ch. 19. [Online]. Available: https://doi.org/10.1007/978-1-4419-1452-1_19

[20] J. Rubin, O. Shamir, and N. Tishby, “Trading value and information in mdps,” in Decision Making with Imperfect Decision Makers. Intelligent Systems Reference Library., W. D. Guy T.V., Kárný M., Ed. Springer, Berlin, Heidelberg., 2012, vol. 28, pp. 57–74. [Online]. Available: https://doi.org/10.1007/978-3-642-24647-0_3

[21] S. Tiomkin and N. Tishby, “A unified bellman equation for causal information and value in markov decision processes,” CoRR, vol. abs/1703.01585, 2017. [Online]. Available: http://arxiv.org/abs/1703.01585

[22] T. Tanaka, H. Sandberg, and M. Skoglund, “Transfer-entropy-regularized markov decision processes,” 2020.

[23] N. Amir, R. Suliman-Lavie, M. Tal, S. Shifman, N. Tishby, I. Nelken, and B. A. Richards, “Value-complexity tradeoff explains mouse navigational learning,” PLOS Computational Biology, vol. 16, no. 12, 2020.

[24] R. M. Hecht, A. B. Hillel, A. Telpaz, O. Tsimhoni, and N. Tishby, “Information constrained control analysis of eye gaze distribution under workload,” IEEE Transactions on Human-Machine Systems, vol. 49, no. 6, pp. 474–484, 2019.

[25] R. M. Hecht, A. Telpaz, G. Kamhi, A. B. Hillel, and N. Tishby, “Information constrained control for visual detection of important areas,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 4080–4084.

[26] R. M. Hecht, A. Telpaz, G. Kamhi, O. Tsimhoni, A. Bar-Hillel, and N. Tishby, “Modeling the effect of driver’s eye gaze pattern under workload: Gaussian mixture approach.” in CogSci, 2020.

[27] R. M. Hecht, A. Bar-Hillel, S. Tiomkin, H. Levi, O. Tsimhoni, and N. Tishby, “Cognitive workload and vocabulary sparseness: Theory and practice,” in Sixteenth annual conference of the international speech communication association, 2015.

[28] C. E. Shannon, “The bandwagon,” IRE transactions on Information Theory, vol. 2, no. 1, p. 3, 1956.

[29] M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio, A. Courville, and D. Hjelm, “Mutual information neural estimation,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80, 10–15 Jul 2018, pp. 531–540.