Towards Visual Reasoning

From Visual Question-Answering to Visual Reasoning

Together with the increased performance on the typical computer vision tasks, computer vision transitions into more holistic reasoning systems. One such study is visual question answering, where the visual system is exposed to questions about images. However, even the most challenging existing tasks can still be handled by systems with limited reasoning capabilities. For instance, the state-of-the-art on VQA, the most popular visual question answering dataset, relies heavily on pre-trained visual features. Yet, other elements that are associated with human intelligence like memory, step-by-step planning, compositional thinking, or symbolic manipulation, are often ignored. We want to close the gap between that "fast thinking", which is often impulsive and in this context responsible for a quick interpretation of the visual scene, and "slower thinking" that is more algorithmic. To achieve such goals, we need to think, build, think again, and build suitable benchmarks, architectures and algorithms. Can you create the first system that connects vision with the reasoning in the next three or four years?


Antol et al. VQA: Visual Question Answering. ICCV'15 & CVPR'18

Malinowski et al. Ask Your Neurons. NeurIPS'14 & ICCV'15

Research directions

In the following research programme, you will

  • define what visual reasoning is by building various datasets and tasks

  • build architectures that deal with basic reasoning tasks such as analogies, counting, intuitive physics, memory, all grounded in perception

  • understand the limitations of the current systems

  • draw inspirations from biological systems

  • move beyond the standard paradigm of learning from pixels towards reasoning about pixels

  • create first systems that can manipulate geometrically for the reasoning purpose

  • create architectures or training strategies that can mitigate bias by doing reasoning

We will work with deep and reinforcement learning.

An ideal candidate

If you are

  • ambitious

  • creative

  • technically well-versed

  • open for a new experience and to quickly learn new technology

  • communicative

  • experienced in at least one: machine learning, computer vision, natural language processing or automated reasoning

Then you are the ideal candidate. Also, you should have completed a Master’s degree in Computer Science, Mathematics, or other related fields.

Johnson et al. CLEVR. CVPR'17

Your PhD studies

In the course of PhD, you will

  • work on cutting-edge scientific directions together with ambitious and highly motivated researchers from the leading research institutes

  • publish at top-tier machine learning or computer vision conferences (NeurIPS, ICLR, CVPR, ICCV, ECCV)

  • shape the scientific landscape in machine learning or computer vision,

  • lead scientific projects

  • be enrolled in the PhD programme at the University of Warsaw

  • study under supervisions of Mateusz Malinowski (DeepMind) and Henryk Michalewski (Google, Oxford and the University of Warsaw)

  • receive a compensation of up to 7000 PLN (1570 EUR) gross per month in the form of a scholarship

The application process

Along with your CV (max. 2 pages), please submit a cover letter (1 page) explaining your interest in the program. The CV should include information about your

  • scientific achievements

  • industrial achievements

  • university grades

  • internships

  • at least one recommendation letter (academic or industrial)

Please, send your applications to the addresses mateuszm@google.com and henrykm@google.com with subject PHD APPLICATION: full name.


Selected candidates will participate in an online interview.

Deadline: 14th of October 2020.

Mateusz Malinowski, PhD, is a scientist at DeepMind. He leads research on visual reasoning, vision + language, and scalable training of video models. He graduated from Max Planck Institute for Informatics and Saarland University and received multiple awards for his contributions to research.

Henryk Michalewski, PhD, is a scientist at Google and a professor of computer science at the University of Warsaw. He leads research on automated reasoning and reinforcement learning. Starting from mid of 2021 Henryk will be a Leverhulme Fellow at the University of Oxford.