AutoML group

Organizers: Heri (, Zhengying (
Link to Zotero AutoML group

Next meeting
24 January 2019 (Thursday) at 17h (Paris time)
Room 2014, DIGITEO (Bat. 660) or via video call link
Check the calendar (Zimbra login required)

Lisheng Sun-Hosoya
Sun-Hosoya et al: ActivMetaL: Algorithm Recommendation with Active Meta Learning
Related papers: paper1, paper2

Cycle of presenters: Guillaume, Zhengying, Heri, Lisheng, Pierre

To-Read List

Link to this list in Zotero group.

Everyone is welcome to suggest and add papers to read (send your Zotero username or email address to the organizers).

Papers to Read

Past Meetings

Meeting of 15/11/2018

PaperFalkner et al: BOHB: Robust and Efficient Hyperparameter Optimization at Scale

Presenter: Heri

Participants: #TODO: @Heri

Main idea of the article: #TODO: @Heri

Slides #TODO: @Heri

Meeting of 25/10/2018

PaperMadrid et al: Towards AutoML in the presence of Drift: first results

Presenter: Zhengying

Participants: Loris, Heri, Marc, Michèle, Guillaume Charpiat

Main idea of the article: In a lifelong learning + concept drift + AutoML setting, the authors use auto-sklearn + a drift detector + several model adaptation methods (e.g. re-train completely the model, update weights, add model, etc) and have some basic results for the AutoML3 challenge

Meeting of 11/10/2018

PaperAndrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston "SMASH: One-Shot Model Architecture Search through HyperNetworks".

Presenter: Guillaume Doquet

Participants: Loris, Heri, Zhengying, Pierre, Guillaume Charpiat

Main idea of the article: #TODO: Guillaume

Slides (#TODO: Guillaume)

Meeting of 20/09/2018

Lorraine, Jonathan, and David Duvenaud. "Stochastic Hyperparameter Optimization through Hypernetworks." arXiv preprint arXiv:1802.09419 (2018).

Presenter: Pierre

Participants: Michèle, Marc, Guillaume Charpiat, Heri, Zhengying

Main idea of the article: #TODO: Pierre

Slides (#TODO: Pierre)

Meeting of 14/06/2018

Paper: Linnan, Wang; Yiyang, Zao; Yuu, Jinnai "AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search".

Presenter: Heri

Participants: Michèle, Marc, Zhengying, Pierre

Main idea of the article: Use Monte Carlo Tree Search method to find optimal architecture for neural network. The proposed  approach exploits the "block" design introduced in NAS (Neural Architecture Search). Another composant to speed up the search is the meta-DNN model (for prediction of the performance of a given architecture).

Meeting of 07/06/2018
PhD seminar

Paper :  Wolpert, David H., and William G. Macready. "No free lunch theorems for optimization." IEEE transactions on evolutionary computation 1.1 (1997): 67-82.
PaperWolpert, David H. "The lack of a priori distinctions between learning algorithms." Neural computation 8.7 (1996): 1341-1390.

Presenter: Zhengying

Participants: Guillaume C., Lisheng, Victor B., Olivier, Diviyan, Théophile, Victor E., Giancarlo

Main idea of the article: Any two (optimization) algorithms work equally well when their performance is averaged across all possible problems.

Meeting of 17/05/2018

Paper :  Al-Shedivat, Maruan, et al. "Continuous adaptation via meta-learning in nonstationary and competitive environments.". ICLR 2018.

Presenter: Lisheng

Participants: Zhengying, Heri, Guillaume D.

Main idea of the article: Learn (via optimizing over-task loss) to continuously adapt policy to nonstationary (modeled as competitive agents) environments. Solve purely RL problems.

Meeting of 03/05/2018

Paper :  Lisha, Li, et al. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization". ICLR 2017.

# TODO: Aris



Main idea of the article: 


Meeting of 12/04/2018

Evolving the Topology of Large Scale Deep Neural Networks 

Presenter: Guillaume DOQUET

Participants: Isabelle, Marc, Michèle, Laurent, Aris, Heri, Pierre, Guillaume, Zhengying

Main idea of the article: Use an evolutionary strategy to learn the structure of a deep neural network. The algorithm operates on 2 levels simultaneously : the macro scale (number and type of layers) and the layer scale
(parameters of that layer). Results are presented on CIFAR10, CIFAR100, MNIST, and Fashion-MNIST, beating or rivalizing state-of-the-art performance.

Meeting of 05/04/2018

Large-Scale Evolution of Image Classifiers
Learning Transferable Architectures for scalable image recognition

Presenter: Pierre Wolinski

Participants: Zhengying, Guillaume Charpiat, Heri, ?

Main idea of the articles:
  • 1st article: basic evolutionary algorithm;
  • 2nd article: train a RNN to build a convolutional neural network, where blocks of layers have the same structure (inception-like networks).

Meeting of 29/03/2018
Monte Carlo tree search for algorithm selection (MOSAIC)

Presenter: Heri

Participants: Zhengying, Guillaume (Charpiat, Doquet), Pierre, Aris

Main idea of the article: Tackle hyperparameter optimization problem using monte carlo tree search: state (value of hyperparameters already choosed), action (value of the next hyperparameter to choose), reward (CV-score). The algorithm is composed of two parts: bandits part (designed for algorithm selection: random forest, svm, ...) and MCTS (for preprocessing and algorithm configuration). This new idea produces good (first) results but many improvements still to be done.

Meeting of 22/03/2018
Synthesis of AutoML Reading Group (21/11/2017-22/03/2018) (yes with color! and you can do that too!)

Presenter: Zhengying

Participants: Isabelle, les 2 Guillaumes, Heri, Pierre, Lisheng

Main idea of the article: Describe the AutoML problem in a comprehensive and intuitive optimization manner, formulate many existing AutoML approaches in a uniform way, attach each approach to one step in the classic machine learning pipeline and make some discussion on future research ideas.

Meeting of 08/03/2018
Paper :  Swersky, Kevin, Jasper Snoek, and Ryan Prescott Adams. "Freeze-thaw Bayesian optimization." arXiv preprint arXiv:1406.3896 (2014).

Presenter: Lisheng

Participants: Isabelle, Marc, Guillaume Charpiat, Heri, Pierre, Zhengying

Main idea of the article: A strategy for efficiently choosing hyperparameters: pause the training of models that are not promising. Model training curves as samples of Gaussian process.

Meeting of 01/03/2018
Paper :  M. Feurer, A. Klein, K. Eggensperger, J.T. Springenberg, M. Blum, F. Hutter "Efficient and Robust Automated Machine Learning" (NIPS 2015)

Presenter: Guillaume Doquet

Participants: Pierre, Zhengying, Isabelle, Guillaume Charpiat

Main idea of the article: Combine meta-learning, tree-based Bayesian Optimization (SMAC) and ensemble method (ensemble selection) to tackle AutoML problems.

Meeting of 22/02/2018

Paper: Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown (International Conference on Learning and Intelligent Optimization 2011)

Presenter: Pierre

Participants: Zhengying, Guillaume D, Guillaume C, Lisheng, Isabelle

Main idea of the article: Presentation of the hyperparameter search algorithm SMAC. SMAC is a SMBO-based algorithm using random forests to model hyperparameters. Moreover, it implements the case where one tunes the hyperparameters for multiple instance sets. (Slides)

Questions & Remarks: instance set possibly refers to a split of a data set between a train set and a validation set

Meeting of 08/02/2018

Paper:  R. Bardenet, M. Brendel, B. Kégl, M. Sebag  "Collaborative hyperparameter tuning" ICML (2013).

Presenter: Heri

Participants: Zhengying, Isabelle, Aris, Pierre, Heri

Main idea of the article: By collaborative tuning of hyperparameter on multiple datasets, one can incorporate (expert) knowledge from similar tasks to improve Bayesian hyperparameter search. Hyper-parameter ranking is used (instead of validation score) to assess the quality of one hyperparameter. (slides)

Meeting of 01/02/2018

Paper:  Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L. J., Fei-Fei, L., & Murphy, K. "Progressive neural architecture search." arXiv preprint arXiv:1712.00559 (2017).

Presenter: Aris

Participants: Guillaume Charpiat, Guillaume Doquet, Heri, Lisheng,  Zhengying, Aris

Main idea of the article: Learn an RNN that estimates the quality of a CNN sub-module ("cell") generated using multiple blocks (each chosen from some fixed options of convolutions and pooling operators). 
When expanding the cell structure, the RNN is used to prune the search space. (Slides)

Meeting of 25/01/2018

Paper : Max Jaderberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu. "Spatial transformer networks." arXiv preprint arXiv:1506.02025v3 (2016).

Presenter : Lisheng

Participants : Michèle, Guillaume Charpiat, Aris, Heri, Guillaume Doquet

Main idea of the article :  A Spatial transformation network which learns an appropriate transformation of input feature map, is proposed to be inserted to existing architecture to make the task (e.g. classification) in later layers easier, this is possible mainly because the STN is differentiable. (Slides)

  1. Idea for potential use: A spatial transformer network can be viewed as a data generator constrained by final performance of the entire network.

Meeting of 18/01/2018

Paper : Koutník, Jan, Juergen Schmidhuber, and Faustino Gomez. "A frequency-domain encoding for neuroevolution." arXiv preprint arXiv:1212.6521 (2012).

Presenter : Zhengying

Participants : Michèle, Isabelle, Guillaume Charpiat, Lisheng, Aris, Heri, Guillaume Doquet

Main idea of the article :  Solve Octopus Arm Problem by using a few Fourier coefficients (chromosome) to compactly represent recurrent neural networks and using Natural Evolution Strategy to select promising prior distribution of neural networks. (Slides)

  1. Heri and Zhengying will run experiments following Michèle's idea

Meeting of 11/01/2018

Paper : Ravid Shwartz-Ziv and Naftali Tishby, "Opening the Black Box of Deep Neural Networks via Information" (Arxiv, March 2017)

Presenter : Guillaume Doquet

Participants : Guillaume Charpiat, Cyril, Zhengying, Heri, Guillaume Doquet

Main idea of the article : Deep neural networks go through 2 distinct phases during training. In the first phase, the mutual information between each hidden layer and the labels increases. In the second phase,
the mutual information between the layers and the data decreases. In other words, a compressed latent representation of the data is found. This is a byproduct of the stochastic nature of the gradient descent. (Slides)

Meeting of 04/01/2018

Paper: Saxe, Andrew M., et al. "On Random Weights and Unsupervised Feature Learning" (ICML 2011).

Presenter: Pierre Wolinski

Participants: Michèle, Guillaume Doquet, Guillaume Charpiat, Zhengying, Heri, Aris

Main idea of the article: In some cases, untrained neural networks are almost as accurate as trained neural networks. By studying the Fourier transform of the convolution, we are able to explain these results. Moreover, the article gives a heuristic for architecture selection. (Slides)

  1. Compare the eigenvalue distribution of a random filter and a trained filter
  2. Replace the random layer applied over an image with the Fourier transform of the image => better ? worse ?

Meeting of 21/12/2017

Paper: Domhan et at. "Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves" (IJCAI 2015).

Presenter: Heri

Participants: Michèle, Guillaume (Doquet, Charpiat), Isabelle, Pierre, Zhengying

Main idea of the article: Speeding up the search of hyperparameter by predicting the performance of the model. Use a set of parametric function to extrapolate the learning curve. Stop runs that are unlikely to perform the best observed so far. (Slides)

  1. Run the code on new dataset

Meeting of 30/11/2017

Paper: Klein, Aaron, et al. "Fast bayesian optimization of machine learning hyperparameters on large datasets." arXiv preprint arXiv:1605.07079 (2016).

Presenter: Zhengying

Participants: Isabelle, Michèle, Marc, Lisheng, Guillaume (Doquet), Heri

Main idea of the article: Use Bayesian Optimization to do hyperparameter selection, with faster training (thus faster loss evaluation) using sampled sub-dataset, following a strategy that chooses next point to evaluate by maximizing information gain per computational cost on the distribution of the global minimum of the goal function (e.g. validation error w.r.t hyperparameter)

Slides: 10 pages, contains also a very brief introduction to Bayesian Optimization and Gaussian Process, with a small exercise ;)

Remarks & questions:

  1. In Bayesian Optimization, the Bayesian philosophy is applied, but Bayes theorem is not used
  2. Isabelle and Lisheng are using an approach with similar idea (maximize knowledge gain per computational cost)  and which is even more general


  1. Zhengying will run the code of the authors to test the performance of the algorithm

Meeting of 21/11/2017

PaperMunoz, Mario A., et al. "Instance Spaces for Machine Learning Classification." Mach. Learn (2017).

Presenter: Guillaume Doquet

Participants: Michèle, Lisheng, Heri, Zhengying

Main idea of the articleExtend the Algorithm Selection Problem framework suggested by Rice to gain knowledge on how well the combination of different algorithms and datasets can be or to objectively measure the performance of an algorithm, using 2-d visualisation in a so-called instance space. For each instance (a dataset and a classification problem), a lot of features are computed and then selected. A performance is adopted then SVM is used for fitting.

Guillaume Doquet,
Jan 16, 2018, 4:15 AM
Lisheng Sun,
Jun 12, 2018, 5:19 AM
Rakotoarison Herilalaina,
Apr 3, 2018, 2:08 PM
Pierre Wolinski,
May 26, 2018, 1:22 AM
Pierre Wolinski,
May 26, 2018, 1:22 AM
Mar 22, 2018, 3:47 PM
Rakotoarison Herilalaina,
Jul 4, 2018, 2:48 AM
Rakotoarison Herilalaina,
Feb 19, 2018, 2:08 AM
Jan 10, 2018, 1:43 AM
Jan 10, 2018, 1:40 AM
Jan 19, 2018, 7:00 AM
Lisheng Sun,
Jan 31, 2018, 5:49 AM
Lisheng Sun,
Mar 8, 2018, 8:11 AM
Rakotoarison Herilalaina,
Apr 17, 2018, 6:14 AM
Jun 7, 2018, 7:47 AM
Nov 12, 2018, 2:11 AM
Jan 10, 2018, 1:42 AM