Neural architecture search tries to shift the manual design of neural network (NN) architectures to algorithmic design. In these cases, the NN architecture itself can be viewed as data and needs to be modeled. A better modeling could help explore novel architectures automatically and open the black box of automated architecture design. To this end, this work proposes a new encoding scheme for neural architectures, the Training-Analogous Graph-based ArchiTecture Encoding Scheme (TA-GATES). TA-GATES encodes an NN architecture in a way that is analogous to its training. Extensive experiments demonstrate that the flexibility and discriminative power of TA-GATES lead to better modeling of NN architectures. We expect our methodology of explicitly modeling the NN training process to benefit broader automated deep learning systems.
State-of-the-art encoding schemes of NN architectures are graph-based that view an NN architecture as a directed acyclic graph (DAG) with computations on it. They encode an architecture following its “information flow”. Each type of computation or operation is mapped to a fixed embedding, and treated as a transformation of the information flow. This analogous modeling of the NN forward propagation improves architecture encoding.
However, these methods neglect that an NN architecture is not a DAG with fixed computations, but a DAG with learnable operations. The information process functionalities of different operations are related to their parameters obtained through NN training, and thus are not the same in many cases, even if these operations belong to the same type. Thus, making all operations of the same type share one transformation across different locations and architectures restrict the representative power of the encoding scheme.
The underlying reason for the issue described above is that an NN architecture not only depicts the computation in one NN forward propagation, but also implies the learning dynamics of parameters. Following this intuition, we propose a Training Analogous Graph-based ArchiTecture Encoding Scheme (TA-GATES). During the encoding process of an architecture, TA-GATES mimics the learning dynamics of its parameterized operations. This training-analogous modeling enables TA-GATES to better encode NN architectures.
The analogy between the training (Right) of an NN architecture with 4 nodes (0/1/2/3 circles) and the TA-GATES’s iterative encoding process (Left).
We propose to apply symmetry breaking to the initial operation embeddings op. Concretely, we propose three types of symmetry-breaking techniques: 1) utilizing random noises; 2) utilizing zero-cost saliency metrics; 3) utilizing zero-cost saliency metrics in every time step.
A single forward pass in the encoding process without (Left) / with (Right) the symmetry-breaking technique.
Architecture Performance Prediction Ability Evaluation
First, we compare the prediction-groundtruth architecture performance ranking correlation for different encoders on four commonly used NAS search spaces: NB101 / NB201 / NB301 and ENAS. Compared with MLP, LSTM,GCN and GATES, TA-GATES generally achieves the best ranking correlation on different training ratios, demonstrating the effectiveness of our proposed method.
Kendall's Tau of using different encoders on NB101, NB201, NB301 and NDS ENAS. The average result of 9 experiments are reported, and the standard deviation is in the subscript.
Then we compare the ranking correlation with different symmetry-breaking techniques. It can be seen that different search spaces and training settings prefer different techniques, but consistently better than ranking correlation without symmetry-breaking.
Kendall’s Tau of using different symmetry-breaking techniques. “None” indicates TA-GATES without symmetry breaking. “Random”, “Add”, “Concat” refer to TA-GATES with the three symmetry-breaking techniques, respectively.
Architecture Search Evaluation
We conduct architecture search on the DARTS search space with our proposed TA-GATES encoder. The Top-1% test errors on ImageNet of the discovered architectures by different methods are shown in the figure below. Our discovered architecture achieves 24.1% test error, much better than other methods (e.g., GHN (27.0%), DARTS (26.9%)).
Comparison of NAS-discovered architectures on ImageNet.
@article{ning12ta,
title={TA-GATES: An Encoding Scheme for Neural Network Architectures},
author={Ning12, Xuefei and Zhou, Zixuan and Zhao, Junbo and Zhao, Tianchen and Deng, Yiping and Tang, Changcheng and Liang, Shuang and Yang, Huazhong and Wang, Yu}
}