Deep networks are successful models of object recognition, and explain neural codes in the primate ventral stream. However, visual scenes typically comprise multiple objects, and scene understanding requires processing of the relations among them. In primates, this process depends on the dorsal stream. I will describe a deep learning architecture that is inspired by the dual-streams architecture of the primate brain. Unlike standard architectures, this network is able to solve a canonical visual reasoning task that depends on the parietal cortex: to count the number of items in a scene, even if those items are entirely novel (zero-shot counting). In doing so, the network naturally learns neural representations of number and space that resemble those found in the primate parietal cortex, follows human trajectories for the development of numerical cognition, and successfully predicts new behavioural effects in eye tracking experiments.Â