Spatiotemporal structures of human visual experience

When you are looking for a book, it may be partially hidden behind a coffee cup. When reading the book, you register the location of each letter within each word. To put the book back to the bookshelf, you reach toward the left and insert it in between other books. The next time you want to read it again, you remember that it is located on the left, next to a red dictionary. These acts in reading are intuitive, natural, and coherent, in which we see things not in isolation or free floating but in relation to each other (spatial structure), and there is an unmistakable continuity of our visual experience from one moment to the next (temporal structure). Such visual experience is ubiquitous and constitutes a large part of who we are—of our relation with the world, of our past and future. But how does information across space and time get combined to give rise to a coherent conscious experience (the binding problem)? Extracting spatiotemporal structures requires integrating information beyond local regions and beyond immediacy. Despite our effortless visual experience, understanding the world across space and over time is an incredibly complex task. The long-term objective of my research is to understand how this occurs. How human cognition emerges from spatial and temporal structures?

In the standard atomistic approach to vision, the visual system is thought to represent the color of the sky as blue, the position of a letter at its absolute location, the identity of an object as what it is. In contrast, I have been developing a new, structural approach, arguing that the visual system represents relations: the color of the sky as bluer than the beach, the position of a letter as its relative location in a pattern, the identity of an object as how it relates to its position and to other objects. In this approach, representations are fundamentally derivative, such that the absolute location of a speck of ink on paper matters little; what really counts is its relative location in a pattern. Indeed, relational representations may well guide the acquisition of absolute representations; for example, knowledge of absolute duration—hour, minute—is learned much later in development than their relative ordering.

I therefore begin from the assumption that spatiotemporal structures are critical to our understanding of the visual world. My research program aims to understand i) how spatial relations are constructed within a visual scene, and how they in turn shape other cognitive processes; ii) how past visual experience and learning guide perception and attention.