# AutoML group

Organizers: Heri (heri@lri.fr), Zhengying (zhengying.liu@inria.fr)

Link to Zotero AutoML group

**Next meeting**

**2 Jul 2020 (Thursday) at 17h **(Paris time)

Room 2014, DIGITEO (Bat. 660) or via **video call link**

Add to calendar / TAU calendar (Zimbra login required)

Presenter:

Dhiaeddine YOUSFI

(slides to be added)

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Cycle of presenters: Zhengying, Heri, Saumya, Romain, Adrian, Dhiaeddine Yousfi, Martin Cepeda, Xiao Shou (TO ADD MORE)

Place to put questions during the presentation: Overleaf link

**To-Read List**

Link to this list in Zotero group.

Everyone is welcome to suggest and add papers to read (send your Zotero username or email address to the organizers).

**Past Meetings**

**Meeting of 11/06/2020**

Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Presenter: Zhengying

Participants: all TAUers were there!

**Main idea of the article: Empirical work that states a hypothesis on the existence of winning tickets (sub-networks that train faster with better performance) and shed on insights on the behavior of SGD. **

**Meeting of 04/06/2020**

Paper: DAG-GNN: DAG Structure Learning with Graph Neural Networks

Presenter: Shou Xiao

Participants: Heri, Isabelle, Kristin Bennett, Zhengying

**Main idea of the article: Structure Learning - learning a weighted adjacency matrix induced by DAG by formulating the generalized linear SEM as optimization problem in the framework of VAE.**

**Meeting of 28/05/2020**

Paper: The geography of COVID-19 spread in Italy and implications for the relaxation of confinement measures

Presenter: Martin Cepeda

Participants: all TAUers were there!

**Main idea of the article: **Using transport-based contact matrices between Italian regions, the authors were able to effectively model bed occupancy, disease spreading and possible post lock-down scenarios for all 107 Italian provinces.

Slides

**Meeting of 08/05/2020**

Paper1: Geometry-Aware Gradient Algorithms for Neural Architecture Search

Paper2: Meta-Dataset : a dataset of datasets for learning to learn from few examples

Presenter: Romain Egelé, Adrian El Baz

Participants: all TAUers were there!

**Main idea of the article 1:**

**The article focused on gradient-based neural architecture search with weight sharing. It is building a formal framework to understand the optimization of these methods. It is bringing an exponentiated gradient to accelerate the convergence to a stationary point. It is bringing a KL-divergence regularization to increase the sparsity of mixture-weights. The authors are also introducing a random search with weight sharing as a new baseline for neural architecture search. Finally the author benchmark existing methods on CIFAR-10 and tested an HPO algorithm ASHA on the NAS problem.**

**Main idea of the article 2: #TODO (@Adrian)**

Slides:

paper1: slides

paper2 **#TODO (@Adrian): please add link here**

**Meeting of 19/03/2020**

Paper: Evolving Neural Networks through Augmenting Topologies

Presenter: Saumya Jetley

Participants: all TAU was there!

**Main idea of the article: #TODO (@Saumya)**

Slides: **#TODO (@Saumya)**

**Meeting of 3/10/2019**

Paper: Each one's 3 favorite papers in ECML PKDD 201

Presenter: Guillaume Doquet, Michèle Sebag, Pierre Wolinski, Victor Berger, Zhengying Liu

Participants: all TAU was there!

**Main idea of the article: **

Slides: **#TODO**

**Meeting of 11/07/2019**

Paper: Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search." ICLR 2019

Presenter: Zhen

Participants: Zhengying, Zhen

**Main idea of the article: **Relaxation on the architecture used, like doing lasso on the architecture

Slides: **#TODO: @Zhen**

**Meeting of 20/06/2019**

Paper: Adam Gaier and David Ha, "Weight Agnostic Neural Networks". arxiv

Presenter: Zhengying

Participants: Lisheng, Zhen, Nathan

**Main idea of the article: **Learning / training ONLY the architecture of neural networks using neuroevolution (NEAT), with UNIFORM (or random) weight for ALL neurons.

**Meeting of 23/05/2019**

Paper: Chelsea Finn et al. "Online Meta-Learning". ICML 2019

Presenter: Lisheng

Participants: Zhengying, Heri, Zhen, Nathan

**Main idea of the article: #TODO: @Lisheng**

Slides **#TODO: @Lisheng**

**Meeting of 16/05/2019**

Paper: Lukas Hahn, et al. “Fast and Reliable Architecture Selection for Convolution Neural Networks”

Presenter: Adrien

Participants: Zhen, Nathan, Zhengying, Heri, Isabelle

**Main idea of the article: #TODO: @Adrien**

Slides **#TODO: @Adrien**

**Meeting of 02/05/2019**

Paper: Marcin Andrychowicz, et al. “Learning to learn by gradient descent by gradient descent”

Presenter: Nathan

Participants: Zhen, Loris, Zhengying

**Main idea of the article: #TODO: @Nathan**

Slides **#TODO: @Nathan**

**Meeting of 25/04/2019**

Paper: Haifeng Jin, et al. “Auto-Keras: An Efficient Neural Architecture Search System.”

Presenter: Heri

Participants: Guillaume, Isabelle, Lisheng, Nathan, Zhen, Zhengying

**Main idea of the article: **This paper introduces new metric of distance between two neural architecture. This metric is composed of layer wise distance and connection wise distane

Slides

**Meeting of 11/04/2019**

**#TODO: @Zhen**

**Meeting of 28/03/2019**

Paper: Franceschi et al. "Bilevel Programming for Hyperparameter Optimization and Meta-Learning". ICML 2018

Presenter: Zhengying

Participants: Heri, Pierre

**Main idea of the article: **Use a bi-level optimization formulation to unify Hyperparameter Optimization and Meta-learning, then use a smooth procedure (a sequence of gradient descent) to approximate the dependence of the solution of the inner problem (as in bi-level formulation).

**Meeting of 21/03/2019**

No specific paper. Slides on Automated Deep Learning are presented.

Presenter: Zhengying

Participants: Zhen

**Main idea of the article: **Formulate AutoML as two layers of learning: supervised learning + reinforcement learning.

**Meeting of 07/03/2019**

Paper: de Laroussilhe, Quentin, et al. "Neural Architecture Search Over a Graph Search Space." arXiv preprint arXiv:1812.10666 (2018).

Presenter: Guillaume Doquet

Participants: Heri, Zhengying, Lisheng, Marc, Guillaume C, Adrien, Zhen

**Main idea of the article: #TODO: @Guillaume**

Slides **#TODO: @Guillaume**

**Meeting of 31/01/2019**

Paper: Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML2017

Presenter: Pierre

Participants: Heri, Zhengying, Lisheng

**Main idea of the article: #TODO: @Pierre**

Slides **#TODO: @Pierre**

**Meeting of 24/01/2019**

Paper: Sun-Hosoya et al: ActivMetaL: Algorithm Recommendation with Active Meta Learning (Related papers: paper1, paper2)

Presenter: Lisheng

Participants: Lisheng, Zhengying, Loris

**Main idea of the article: **Bring AutoML problem to recommender system, treat unknown performance as missing values in the meta learning matrix and use matrix factorization (non-probabilistic vs. probabilistic) techniques to solve that matrix.

**Meeting of 15/11/2018**

Paper: Falkner et al: BOHB: Robust and Efficient Hyperparameter Optimization at Scale

Presenter: Heri

Participants: Michèle, Pierre, Loris, Guillaume.

**Main idea of the article: **Improve Hyperband approach by using Baysian Optimization for sampling the set of hyperparameter (instead of uniform sampling for standard Hyperband)

Slides

**Meeting of 25/10/2018**

Paper: Madrid et al: Towards AutoML in the presence of Drift: ﬁrst results

Presenter: Zhengying

Participants: Loris, Heri, Marc, Michèle, Guillaume Charpiat

**Main idea of the article: **In a lifelong learning + concept drift + AutoML setting, the authors use auto-sklearn + a drift detector + several model adaptation methods (e.g. re-train completely the model, update weights, add model, etc) and have some basic results for the AutoML3 challenge

**Meeting of 11/10/2018**

Paper: Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston "SMASH: One-Shot Model Architecture Search through HyperNetworks".

Presenter: Guillaume Doquet

Participants: Loris, Heri, Zhengying, Pierre, Guillaume Charpiat

**Main idea of the article:** **#TODO: Guillaume**

Slides (**#TODO: Guillaume**)

**Meeting of 20/09/2018**

Paper: Lorraine, Jonathan, and David Duvenaud. "Stochastic Hyperparameter Optimization through Hypernetworks." arXiv preprint arXiv:1802.09419 (2018).

Presenter: Pierre

Participants: Michèle, Marc, Guillaume Charpiat, Heri, Zhengying

**Main idea of the article:** **#TODO: Pierre**

Slides (**#TODO: Pierre**)

**Meeting of 14/06/2018**

Paper: Linnan, Wang; Yiyang, Zao; Yuu, Jinnai "AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search".

Presenter: Heri

Participants: Michèle, Marc, Zhengying, Pierre

**Main idea of the article:** Use Monte Carlo Tree Search method to find optimal architecture for neural network. The proposed approach exploits the "block" design introduced in NAS (Neural Architecture Search). Another composant to speed up the search is the meta-DNN model (for prediction of the performance of a given architecture).

Slides

**Meeting of 07/06/2018**

**PhD seminar**

Paper : Wolpert, David H., and William G. Macready. "No free lunch theorems for optimization." IEEE transactions on evolutionary computation 1.1 (1997): 67-82.

Paper: Wolpert, David H. "The lack of a priori distinctions between learning algorithms." Neural computation 8.7 (1996): 1341-1390.

Presenter: Zhengying

Participants: Guillaume C., Lisheng, Victor B., Olivier, Diviyan, Théophile, Victor E., Giancarlo

**Main idea of the article:** Any two (optimization) algorithms work equally well when their performance is averaged across all possible problems.

**Meeting of 17/05/2018**

Paper : Al-Shedivat, Maruan, et al. "Continuous adaptation via meta-learning in nonstationary and competitive environments.". ICLR 2018.

Presenter: Lisheng

Participants: Zhengying, Heri, Guillaume D.

**Main idea of the article:** Learn (via optimizing over-task loss) to continuously adapt policy to nonstationary (modeled as competitive agents) environments. Solve purely RL problems.

**Meeting of 03/05/2018**

Paper : Lisha, Li, et al. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization". ICLR 2017.

**# TODO: Aris**

Presenter:

Participants:

**Main idea of the article:**

Slides

**Meeting of 12/04/2018**

**Evolving the Topology of Large Scale Deep Neural Networks **

Presenter: Guillaume DOQUET

Participants: Isabelle, Marc, Michèle, Laurent, Aris, Heri, Pierre, Guillaume, Zhengying

**Main idea of the article:** Use an evolutionary strategy to learn the structure of a deep neural network. The algorithm operates on 2 levels simultaneously : the macro scale (number and type of layers) and the layer scale

(parameters of that layer). Results are presented on CIFAR10, CIFAR100, MNIST, and Fashion-MNIST, beating or rivalizing state-of-the-art performance.

Slides

**Meeting of 05/04/2018**

**Large-Scale Evolution of Image Classifiers**

**Learning Transferable Architectures for scalable image recognition**

Presenter: Pierre Wolinski

Participants: Zhengying, Guillaume Charpiat, Heri, ?

**Main idea of the articles:**

- 1st article: basic evolutionary algorithm;
- 2nd article: train a RNN to build a convolutional neural network, where blocks of layers have the same structure (inception-like networks).

Slides : 1st article, 2nd article

**Meeting of 29/03/2018**

**Monte Carlo tree search for algorithm selection (MOSAIC)**

Presenter: Heri

Participants: Zhengying, Guillaume (Charpiat, Doquet), Pierre, Aris

**Main idea of the article:** Tackle hyperparameter optimization problem using monte carlo tree search: state (value of hyperparameters already choosed), action (value of the next hyperparameter to choose), reward (CV-score). The algorithm is composed of two parts: bandits part (designed for algorithm selection: random forest, svm, ...) and MCTS (for preprocessing and algorithm configuration). This new idea produces good (first) results but many improvements still to be done.

Slides

**Meeting of 22/03/2018**

**Synthesis of AutoML Reading Group (21/11/2017-22/03/2018) **(yes with color! and you can do that too!)

Presenter: Zhengying

Participants: Isabelle, les 2 Guillaumes, Heri, Pierre, Lisheng

**Main idea of the article:** Describe the AutoML problem in a comprehensive and intuitive optimization manner, formulate many existing AutoML approaches in a uniform way, attach each approach to one step in the classic machine learning pipeline and make some discussion on future research ideas.

**Meeting of 08/03/2018**

Paper : Swersky, Kevin, Jasper Snoek, and Ryan Prescott Adams. "Freeze-thaw Bayesian optimization." arXiv preprint arXiv:1406.3896 (2014).

Presenter: Lisheng

Participants: Isabelle, Marc, Guillaume Charpiat, Heri, Pierre, Zhengying

**Main idea of the article:** A strategy for efficiently choosing hyperparameters: pause the training of models that are not promising. Model training curves as samples of Gaussian process.

**Meeting of 01/03/2018**

Paper : M. Feurer, A. Klein, K. Eggensperger, J.T. Springenberg, M. Blum, F. Hutter "Efficient and Robust Automated Machine Learning" (NIPS 2015)

Presenter: Guillaume Doquet

Participants: Pierre, Zhengying, Isabelle, Guillaume Charpiat

**Main idea of the article:** Combine meta-learning, tree-based Bayesian Optimization (SMAC) and ensemble method (ensemble selection) to tackle AutoML problems.

**Meeting of 22/02/2018**

Paper: Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown (*International Conference on Learning and Intelligent Optimization* 2011)

Presenter: Pierre

Participants: Zhengying, Guillaume D, Guillaume C, Lisheng, Isabelle

**Main idea of the article**: Presentation of the hyperparameter search algorithm SMAC. SMAC is a SMBO-based algorithm using random forests to model hyperparameters. Moreover, it implements the case where one tunes the hyperparameters for multiple *instance sets*. (Slides)

Questions & Remarks: *instance set* possibly refers to a split of a data set between a train set and a validation set

**Meeting of 08/02/2018**

Paper: R. Bardenet, M. Brendel, B. Kégl, M. Sebag "Collaborative hyperparameter tuning" ICML (2013).

Presenter: Heri

Participants: Zhengying, Isabelle, Aris, Pierre, Heri

**Main idea of the article:** By collaborative tuning of hyperparameter on multiple datasets, one can incorporate (expert) knowledge from similar tasks to improve Bayesian hyperparameter search. Hyper-parameter ranking is used (instead of validation score) to assess the quality of one hyperparameter. (slides)

**Meeting of 01/02/2018**

Paper: Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L. J., Fei-Fei, L., & Murphy, K. "Progressive neural architecture search." arXiv preprint arXiv:1712.00559 (2017).

Presenter: Aris

Participants: Guillaume Charpiat, Guillaume Doquet, Heri, Lisheng, Zhengying, Aris

**Main idea of the article:** Learn an RNN that estimates the quality of a CNN sub-module ("cell") generated using multiple blocks (each chosen from some fixed options of convolutions and pooling operators).

When expanding the cell structure, the RNN is used to prune the search space. (Slides)

**Meeting of 25/01/2018**

Paper : Max Jaderberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu. "Spatial transformer networks." arXiv preprint arXiv:1506.02025v3 (2016).

Presenter : Lisheng

Participants : Michèle, Guillaume Charpiat, Aris, Heri, Guillaume Doquet

**Main idea of the article : ** A Spatial transformation network which learns an appropriate transformation of input feature map, is proposed to be inserted to existing architecture to make the task (e.g. classification) in later layers easier, this is possible mainly because the STN is differentiable. (Slides)

To-do:

- Idea for potential use: A spatial transformer network can be viewed as a data generator constrained by final performance of the entire network.

**Meeting of 18/01/2018**

Paper : Koutník, Jan, Juergen Schmidhuber, and Faustino Gomez. "A frequency-domain encoding for neuroevolution." arXiv preprint arXiv:1212.6521 (2012).

Presenter : Zhengying

Participants : Michèle, Isabelle, Guillaume Charpiat, Lisheng, Aris, Heri, Guillaume Doquet

**Main idea of the article : ** Solve Octopus Arm Problem by using a few Fourier coefficients (chromosome) to compactly represent recurrent neural networks and using Natural Evolution Strategy to select promising prior distribution of neural networks. (Slides)

To-do:

- Heri and Zhengying will run experiments following Michèle's idea

**Meeting of 11/01/2018**

Paper : Ravid Shwartz-Ziv and Naftali Tishby, "Opening the Black Box of Deep Neural Networks via Information" (Arxiv, March 2017)

Presenter : Guillaume Doquet

Participants : Guillaume Charpiat, Cyril, Zhengying, Heri, Guillaume Doquet

**Main idea of the article : **Deep neural networks go through 2 distinct phases during training. In the first phase, the mutual information between each hidden layer and the labels increases. In the second phase,

the mutual information between the layers and the data decreases. In other words, a compressed latent representation of the data is found. This is a byproduct of the stochastic nature of the gradient descent. (Slides)

**Meeting of 04/01/2018**

Paper: Saxe, Andrew M., et al. "On Random Weights and Unsupervised Feature Learning" (ICML 2011).

Presenter: Pierre Wolinski

Participants: Michèle, Guillaume Doquet, Guillaume Charpiat, Zhengying, Heri, Aris

**Main idea of the article**: In some cases, untrained neural networks are almost as accurate as trained neural networks. By studying the Fourier transform of the convolution, we are able to explain these results. Moreover, the article gives a heuristic for architecture selection. (Slides)

To-do:

- Compare the eigenvalue distribution of a random filter and a trained filter
- Replace the random layer applied over an image with the Fourier transform of the image => better ? worse ?

**Meeting of 21/12/2017**

Paper: Domhan et at. "Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves" (IJCAI 2015).

Presenter: Heri

Participants: Michèle, Guillaume (Doquet, Charpiat), Isabelle, Pierre, Zhengying

**Main idea of the article**: Speeding up the search of hyperparameter by predicting the performance of the model. Use a set of parametric function to extrapolate the learning curve. Stop runs that are unlikely to perform the best observed so far. (Slides)

To-do:

- Run the code on new dataset

**Meeting of 30/11/2017**

Paper: Klein, Aaron, et al. "Fast bayesian optimization of machine learning hyperparameters on large datasets." *arXiv preprint arXiv:1605.07079* (2016).

Presenter: Zhengying

Participants: Isabelle, Michèle, Marc, Lisheng, Guillaume (Doquet), Heri

**Main idea of the article**: Use Bayesian Optimization to do hyperparameter selection, with faster training (thus faster loss evaluation) using sampled sub-dataset, following a strategy that chooses next point to evaluate by maximizing information gain per computational cost on the distribution of the global minimum of the goal function (e.g. validation error w.r.t hyperparameter)

Slides: 10 pages, contains also a very brief introduction to Bayesian Optimization and Gaussian Process, with a small exercise ;)

Remarks & questions:

- In Bayesian Optimization, the Bayesian philosophy is applied, but Bayes theorem is not used
- Isabelle and Lisheng are using an approach with similar idea (maximize knowledge gain per computational cost) and which is even more general

To-do:

- Zhengying will run the code of the authors to test the performance of the algorithm

**Meeting of 21/11/2017**

Paper: Munoz, Mario A., et al. "Instance Spaces for Machine Learning Classification." *Mach. Learn* (2017).

Presenter: Guillaume Doquet

Participants: Michèle, Lisheng, Heri, Zhengying

**Main idea of the article**: Extend the Algorithm Selection Problem framework suggested by Rice to gain knowledge on how well the combination of different algorithms and datasets can be or to objectively measure the performance of an algorithm, using 2-d visualisation in a so-called instance space. For each instance (a dataset and a classification problem), a lot of features are computed and then selected. A performance is adopted then SVM is used for fitting.