A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS

Abstract

This work proposes a novel Graph-based neural ArchiTecture Encoding Scheme, a.k.a. GATES, to improve the predictor-based neural architecture search. Specifically, different from existing graphbased schemes, GATES models the operations as the transformation of the propagating information, which mimics the actual data processing of neural architecture. GATES is a more reasonable modeling of the neural architectures, and can encode architectures from both the “operation on node” and “operation on edge” cell search spaces consistently. Experimental results on various search spaces confirm GATES’s effectiveness in improving the performance predictor. Furthermore, equipped with the improved performance predictor, the sample efficiency of the predictor-based neural architecture search (NAS) flow is boosted. Codes are available at https://github.com/walkerning/aw_nas.

Problem Overview

Predictor-based NAS relies on an architecture-performance predictor to evaluate candidate architectures efficiently. However, the prediction ability strongly influences the effectiveness of the search process . Generally, the predictor first encodes the input architecture into a latent embedding and then gives the predicted score with an MLP. Traditionally, the encoder is an LSTM, MLP or GCN. In this paper, we propose a graph-based encoder by modeling the information flow in the architecture for a better embedding ability and thus better search effeciency.

Method

GATES Architecture

To encode a cell architecture into an embedding vector, GATES follows the ideology of modeling the information flow in the architecture, and uses the output information as the embedding of the architecture.

Specifically, the encoding process of GATES mimics the NN forward process. In the NN forward process, an image is taken as the input data, and each operation processes the data. In the encoding process of GATES, a piece of “virtual information” is taken as the input node embedding, and each operation is a transformation of the propagated information. For example, in the red box, the computation of the feature map F2 at node 2 in the NN forward process is F2 = Conv3x3(F0 + F1). Analogically, the transformation of the “virtual information” in the encoding process at node 2 is N2 = m2 (N0 + N1), where the attention mask m2 = σ(GetEmbedding(O, Conv3x3)Wo) is the attention mask corresponding to the operation type Conv3x3, where σ is the sigmoid function, O is the embeddings of No types of operations and Wo is a linear transformation matrix.

Ranking Loss

What is really required to guide the search of architectures is the relative ranking order of architectures rather than the absolute performance values. We adopt Kendall’s Tau ranking correlation as the measure as the direct criterion for evaluating architecture predictor and propose a ranking loss instead of the regression loss to train the predictor.

Illustration of the encoding processes of GATES of a cell architecture

Architecture Search Framework

The figure below illustrates the overall search flow of our proposed method. To begin with, we randomly sample some architectures from the search space and evaluate their ground-truth performance. Then, we train the initial GATES predictor with the ranking loss on the obtained architecture-performance. After that, we conduct an inner search (e.g., random sample, evolutionary search) based on the predicted scores. The sampled architectures are next evaluated to get the ground-truth performance and used to finetuned the predictor. Iteratively, the algorithm can discover high-performance neural architectures.

The overview of the proposed algorithm. Upper: The general flow of the predictor-based NAS. Lower: Illustration of the encoding processes of GATES of an OON cell architecture.

Results

Architecture Performance Prediction

First, we evaluate our proposed GATES encoder and ranking loss on two common NAS benchmarks: NAS-Bench-101 and NAS-Bench201, respectively, with different proportions of the training samples. We report the Kendall's Tau correlation between the predicted scores and actual performance as the prediction ability criterion.

Our proposed method consistently outperforms other methods. For example, on NAS-Bench-201, GATES + Pairwise (Hinge) achieves 0.7401 Kendall's Tau correlation, much better than LSTM + Pairwise (Hinge) (0.5550) and LSTM + Regression (MSE) (0.4405).

The Kendalls Tau of using different loss functions on NAS-Bench-101. The first 90% (381262) architectures in the dataset are used as the training data, and the other 42362 architectures are used as the testing data. All experiments except “Regression (MSE) + GCN” are carried out with GATES encoder.

The Kendalls Tau of using different encoders and loss functions on NASBench-201. The first 50% (7813) architectures in the dataset are used as the training data, and the other 7812 architectures are used as the testing data.

Architecture Search

Equipped with a better performance predictor, the sample efficiency of the predictor-based NAS process can be significantly improved. To verify that, we conduct the predictor-based architecture search on NAS-Bench-101 using various encoders (i.e., LSTM, MLP, GCN).

The results of running predictor-based NAS methods with different encoders are shown in the figure below. We conduct experiments with two inner search methods: random search, and evolutionary algorithm. We can see that the sample efficiency using GATES surpasses the baselines with different inner search methods. This verifies the analysis that utilizing a better neural architecture encoder in the predictor-based NAS flow leads to better sample efficiency.

Comparison of predictor-based NAS with different encoders: The best validation accuracy during the search process over 10/15 runs for the RS and EA inner search method, respectively. r is the sample ratio.

Comparison of NAS-discovered architectures on ImageNet

Then we conduct architecture search on the DARTS search space and evaluate the final performance on CIFAR-10 dataset. Our discovered architecture achieves 2.58% test error, surpassing ENAS (2.89%) and DARTS (2.76%) for a large margin, which demonstrates the effectiveness of our proposed method.

BibTex

@inproceedings{ning2020generic,

title={A generic graph-based neural architecture encoding scheme for predictor-based nas},

author={Ning, Xuefei and Zheng, Yin and Zhao, Tianchen and Wang, Yu and Yang, Huazhong},

booktitle={European Conference on Computer Vision},

pages={189--204},

year={2020},

organization={Springer}

}