Machine Number Sense:
A Dataset of Visual Arithmetic Problems
for Abstract and Relational Reasoning

As a comprehensive indicator of mathematical thinking and intelligence, the number sense (Dehaene, 2011) bridges the induction of symbolic concepts and the competence of problem- solving. To endow such a crucial cognitive ability to machine intelligence, we propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model—And-Or Graph (AOG). These visual arithmetic problems are in the form of geometric figures: each problem has a set of geometric shapes as its context and embedded number symbols. Solving such problems is not trivial; the machine not only has to recognize the number, but also to interpret the number with its contexts, shapes, and relations (e.g., symmetry) together with proper operations. We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task. Comprehensive experiments show that current neural-network-based models still struggle to understand number concepts and relational operations. We show that a simple brute-force search algorithm could work out some of the problems without context information. Crucially, taking geometric context into account by an additional perception module would provide a sharp performance gain with fewer search steps. Altogether, we call for attention in fusing the classic search-based algorithms with modern neural networks to discover the essential number concepts in future research.

Wenhe Zhang, Chi Zhang, Yixin Zhu, Song-Chun Zhu

The 34th AAAI Conferences on Artificial Intelligence, Februrary 7-12, New York (Oral Presentation)

Code / Dataset

Dataset Generation

A test is generated by parsing and sampling an And-Or Graph (AOG). Each problem has an internal hierarchical tree structure composed of And-nodes and Or-nodes; an And-node denotes a decomposition of a larger entity in the grammar, and an Or-node denotes an alternative decomposition.

  • We design three problem types: (a) Combination, (b) Composition, and (c) Partition, each of which has a distinctive layout.

  • Each problem contains two important components:

    • Layout component serves as the problem context. The attributes vary with different problem types.

    • Algebra component serves as the problem content.

      • A crucial attribute is the styles of interpretation --- holistic view and analytic view. From a holistic perspective, all the numbers in a panel are involved in the same calculation process together as a whole. From an analytic perspective, the numbers are grouped as several parts, and each part undergoes an individual calculation process.

      • Another attribute indicates the mathematical objects in the problem, including a list of randomly sampled operators and integer constants. The sampled operators and constant slots are fed into an in-order binary tree to sample numbers for instantiation

Experiments and Analysis

We benchmark the proposed MNS dataset using both pre-dominant neural network models and classic search-based algorithms.

  • Four state-of-the-art neural-network-based CV models for visual problem-solving:

    • a front-end CNN as feature extractor;

    • a LSTM model with a CNN backbone combined with an MLP head;

    • an image classifier based on ResNet;

    • a relational network (RN).

  • Two types of the symbolic search-based models:

    • pure symbolic search, the input is the numbers in each panel;

    • context-guided search, the input includes both the numbers and semantic context.

Additionally, human performance on the MNS dataset has also been collected.

Main Results:

  • The overall accuracy of neural network models is close to that of pure symbolic search within 100 steps and context-guided search within 50 steps, both of which are relatively small compared to the large problem space.

  • The performance of search algorithms varies across different types of problem, different styles of interpretation, and different numbers of integers, in strong contrast to the performance consistency of neural network models.

  • Although pure symbolic search is able to solve some problems, context-guided search has generally better performance, especially on problems with higher complexity.

  • Compared to the benchmarked computational models, human achieves a significantly higher accuracy in all types of problems without extensive training.

Possible Reasons:

  • The representations of number symbols and geometric contexts differ:

    • search algorithms: symbolized concepts;

    • neural network models: extracted features.

  • The internal processes of visual information are distinctive:

    • search algorithms: process number concepts in a sequential manner;

    • neural network models: process visual features in parallel.

  • The abilities to separate problem content from problem context is also different:

    • Search algorithms are advantageous than neural network models, since the number symbols and geometric context information are fed into search algorithms separately.

Conclusions and Discussions

  • Compared to simple symbolic search-based models, the poor performance of neural network models suggests its insufficiency in symbolic processing and concept understanding, as well as its difficulty in combining content and context to solve problems flexibly.

  • Challenges for future work: how to emerge symbolic concepts directly from pixels with minimal supervisions, how to extract meaningful relations from contextual information, and how to reason and make inductions based on concepts and relations.

  • Fusing neural network models' strong capacity of feature extraction in large-scale data processing and search-based algorithms' explicit knowledge structure in fit-for-purpose problem-solving may be an effective method for relational and abstract reasoning.