ANR-17-EURE-0017

Novel directions in network-based modelling of human sensory processes

Funded by Agence Nationale de la Recherche (ANR-17-EURE-0017) jointly with Srdjan Ostojic

We have all heard of neural networks and their application to sensory neuroscience. A popular approach is to focus on a higher-level task (e.g. natural image categorization) that is poorly explained by complex models of the sensory process, and tailor the network to this problem with satisfactory results (Yamins & DiCarlo 2016; Fig. 1B). We plan to approach the connection between modelling and sensory processing in a different fashion, in some sense opposite to the one above, and in another sense complementary to it. We will focus on a low-level visual task (e.g. detection of a bar) that is poorly explained by existing computational accounts of the sensory process (Fig. 1A), and determine whether a network model that is optimized for higher-level tasks can explain the empirical characteristics of the low-level visual process (Fig. 1C). Why? What is the point of approaching it this way?

Figure 1 Current approaches (top) train different network models with either low-level (A) or higher-level tasks (B), and test them in the task they have been trained with. This approach has led to unsatisfactory results for low-level tasks (red x symbol in A). In our approach (bottom) we will train one network with higher-level tasks and test it on both low-level and higher-level tasks (C). We expect that the structure imposed by higher-level constraints will reproduce the idiosyncratic behaviour measured from human vision in low-level tasks. In this cartoon, the low-level task is indicated by a dark bar embedded in visual noise (left), the higher-level task by a similar element surrounded by natural image statistics (right).

We now have extensive datasets from humans performing various detection/discrimination tasks, involving minimal stimulus designs (Neri 2015; Neri 2018) as well as complex natural scenes (Neri 2014; Neri 2017). Perhaps surprisingly we find that, when existing theoretical/modelling frameworks are separately tailored to each task category on its own, they are relatively better at capturing results from higher-level tasks (Neri 2017) than low-level tasks (Neri 2018). How can this be? After all, one may naively expect that, if we can explain the more complex sensory process, this should imply that we can also explain the simpler processes supposedly underlying and supporting the complex process (see below).


There are three fundamental difficulties with the above-outlined way of thinking about the problem. First, it is not easy to define what is complex and what is simple in sensory discrimination tasks: is it more `complex' to identify animals from easily visible natural images (Thorpe et al 1996), or to detect a faint signal embedded within challenging visual noise (Pelli & Farell 1999)? Second, even if we can formulate a reasonable definition of simple and complex tasks, it is not necessarily the case that humans perform complex tasks by engaging the same mechanisms that operate for simpler tasks (Cox 2014). Third, from an empirical standpoint alone, it is not the case that the level and detail of characterization that can be feasibly achieved for simple tasks may also be available for complex tasks: the sensory processes underlying low-level tasks can often be characterized with much greater precision and richness of experimental detail, avoiding interpretational issues that beset results from higher-level tasks (Wichmann et al 2010).


For all the above reasons, it is not at all clear that if a model can provide a satisfactory account of e.g. natural image categorization (Yamins & DiCarlo 2016; Neri 2017), the same model will also capture the characteristics of e.g. detection of a simple vertical segment (Neri 2018). To the contrary, as we have pointed out above, our extensive empirical evidence from a number of experimental projects indicates that current theoretical frameworks are often able to capture human behaviour for higher-level tasks to a satisfactory degree, but fail miserably when attempting to reproduce behaviour associated with low-level detection/discrimination tasks (Neri 2018).


We will tackle this problem in three stages: 1) collect highly constraining data from a low-level task where the target signal is either presented in isolation or embedded within higher-level contextual elements (primarily supervised by Neri with input from Ostojic); 2) develop a network model that is trained to operate on the higher-level elements/tasks, and then challenge this same model with the low-level task to determine whether it displays idiosyncratic behaviour that mirrors that measured in humans (primarily supervised by Ostojic with input from Neri); 3) exploit the model to design an additional set of experiments for which the model makes specific non-trivial predictions, and test those predictions on human observers (jointly supervised by Neri and Ostojic; this stage remains unspecified until completion of stages #1-#2).

Relevant publications:

• Neri P The empirical characteristics of human pattern vision defy theoretically-driven expectations 2018 PLoS Computational Biology 14(12): e1006585

• Neri P Object segmentation controls image reconstruction from natural scenes 2017 PLoS Biology 15 (8) e1002611

• Neri P The elementary operations of human vision are not reducible to template matching 2015 PLoS Computational Biology 11 (11) e1004499

Neri P Semantic control of feature extraction from natural scenes 2014 Journal of Neuroscience 34 2374-2388