ANR WHOLESCENE

THE SPACE IN BETWEEN: WHERE LOCAL MEETS GLOBAL IN HUMAN VISION (PRCI: FRANCE-SWITZERLAND)

Funded by Agence Nationale de la Recherche (ANR WHOLESCENE)

During the last seventy years, vision research has successfully pursued a programme structured around localized neural circuits. In behavioral and neural experiments, circuits are typically characterized using simple laboratory stimuli. At the heart of this classic framework is the assumption that these separate neural circuits operate largely independently of each other, and that their hierarchical feedforward integration eventually leads to complex object and scene recognition. However, this explanatory framework collapses altogether once stimuli are put into context, for example, in natural scenes. Local circuits do not operate independently. Although this fundamental limitation is largely acknowledged, the research community has devoted relatively little efforts towards understanding contextual vision in its full articulation (i.e., beyond summary and natural scene statistics). The few attempts to tackle contextual vision appear stuck between two worlds: globally embedded circuits on the one hand, and phenomenological Gestalt on the other. With this grant, our laboratory joined forces with Michael Herzog's laboratory (EPFL, Switzerland) to propose a new framework for contextual vision that marries two conceptual/computational approaches spear-headed by the two laboratories, based on their parallel work over the past 15 years.

Figure 1 Research from our laboratory has focused primarily on detailed characterization of the computational circuits that support processing within a localized region of the scene (gain-control circuit in the middle). Research from the Herzog laboratory has instead focused on characterizing the computations that occur across the scene to extract global Gestalt configurations (Laminart circuit in the surround). The red space in between, about which we know close to nothing, is the focus of this joint grant project.

To provide a concrete example of the different approaches adopted by our laboratory and the Herzog laboratory, consider a target line surrounded by multiple elements across a large scene. Our laboratory would model the local process using a physiologically plausible circuit that incorporates various nonlinear elements such as divisive gain control, temporal delay lines, energy extraction, and other related operators (center circuit in Figure 1). This circuit would be able to capture detailed aspects of how perceptual tuning depends on contrast and other image properties (Neri 2018). In order to model the effect of contextual elements, the circuit would be re-parameterized and/or some branches would be suppressed/activated; the resulting simulations may reproduce the context-dependent effects observed experimentally, however, this approach fails to deal with the most pressing question, namely: what is the origin of the signal that instructs the circuit to switch parameterization or drop/add a specific branch? No mechanistic answer is provided by a purely local approach focused on the circuit itself. The scene is treated as a vague entity that somehow projects its influence onto the local target without modeling this influence explicitly.

Herzog and collaborators would approach this problem from a different, complementary perspective. The target line is encoded by a “line-like” filter that should not be intended as a faithful representation of the underlying feature-extracting mechanism. This filter would not be able to capture detailed aspects of how perceptual tuning depends on contrast and other image properties, not even locally. On the other hand, the scene is subjected to the analysis of a fully specified network (surround circuit in Figure 1) in order to extract a complex multi-layered representation that acts as contextual signal and depends on the specific configuration of its constituent elements. The dependence on contextual configuration is not vaguely specified: it is mechanistic and very specific, capable of reproducing puzzling empirical measurements (Manassi et al. 2016).

Where is the challenge? On a superficial level, it may seem like an easy matter to integrate the two approaches outlined above: one may think it should be possible to simply glue them together. Unfortunately, the main ingredient for achieving this goal is missing: the glue itself. There is a space between local and global models, an interfacing mechanism about which we know nothing (red region in Figure 1). The research programme proposed in this grant will allow us to discover its structure.