AutoML (Humania) reading group
Organizers: Haozhe (, Heri (, Zhengying (
Link to Zotero AutoML group
Even though this Reading Group is called AutoML due to historical reason, you are free to present papers that you wish to share in your own field or elsewhere.
Next meeting
Paper: TBA
Presenter: TBA
Cycle of presenters
Haozhe, Adrian, Adrien, Herilalaina, Romain, Saumya, Zhen, Zhengying
If you are not in this list but want to give presentations, please send a message to the organizers.
Practical information
During the Covid period, the meetings will be online via Skype. The link is here:
Legacy: Room 2014, DIGITEO (Bat. 660) or via video call link
Add to calendar / TAU calendar (Zimbra login required)
Place to put questions during the presentation: Overleaf link
To-Read List
Link to this list in Zotero group.
Everyone is welcome to suggest and add papers to read (send your Zotero username or email address to the organizers).
"tau-seminars" <> (only send to this mail)
"tau" <>
"guillaume charpiat" <>
"Isabelle Guyon" <>
"Michele Sebag" <>
"doquet" <>
"fissore" <>
"Marc Schoenauer" <>
"Laurent Cetinsoy" <>
"loris felardos" <>
Michael Vaccaro <>
Hugo Sonnery <>
Past Meetings
April 21, 2021
Paper: Hyperparameter Ensembles for Robustness and Uncertainty Quantification
Presenter: Romain
Participants: Haozhe, Zhengying, Marc, Zak
building an ensemble is easy
an ensemble of models can be used for uncertainty quantification
diversity is important in an ensemble
April 14, 2021
Paper: Meta Pseudo Labels
Presenter: Zhengying
Participants: Adrien, Adrian, Zhen, Haozhe, Alessandro, Hung Nguyen
Use a (constantly changing) teacher network and student network to achieve semi-supervised learning
Interesting ideas for semi-supervised learning, integrating ideas from meta-learning, NAS and other fields
Achieved 90.2% top-1 accuracy on ImageNet
March 17, 2021
Paper: Free Lunch for Few-shot Learning: Distribution Calibration
Presenter: Heri
Participants: Adrien, Adrian, Zhengying
Takeaway: One can bypass expensive fine tuning with a simple distribution calibration over the embedded features. This propose to estimate data statistics (mean and variance) of target task by transferring statistics of neighbor classes used in Train. Then, target data statistics are used to generate more examples on the support set. And finally, they train simple classifier over the augmented data. Their approach beats some complex and expensive few shot learners.
March 10, 2021
Paper: Cross-validation for selecting a model selection procedure
Presenter: Adrien
Participants: Adrian, Haozhe, Zhen, Zhengying
Takeaway: the following 3 statements about cross-validation (CV) are NOT generally true:
Leave-one-out (LOO) CV has smaller bias but larger variance than K-Fold CV
Better estimation of the prediction error by CV means better model selection
The best method to use for model selection is 10-fold CV (but actually it seems to perform well and is a good trade-off)
February 10, 2021
Paper: Compositional generalization through meta sequence-to-sequence learning
Presenter: Adrian
Participants: Zhen, Haozhe, Isabelle, Michael, Adrien
Takeaway: compositional learning on SCAN tasks (seq2seq) created some ground to experiment the selection of meta-training data to induce inductive biases into a model. It would be interesting to see if similar experiments can be performed on more complex data (e.g. images, ...).
January 27, 2021
Paper: Meta-Learning Symmetries by Reparameterization
Presenter: Haozhe Sun
Participants: Adrian, Romain, Zhen, Zhengying
Takeaway: this paper presents a way to automatically learn equivariance from data that share certain common underlying symmetry. This approach uses matrix factorization and meta-learning framework.
July 2, 2020
Paper: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Presenter: Dhiaeddine YOUSFI
Meeting of 11/06/2020
Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Presenter: Zhengying
Participants: all TAUers were there!
Main idea of the article: Empirical work that states a hypothesis on the existence of winning tickets (sub-networks that train faster with better performance) and shed on insights on the behavior of SGD.
Meeting of 04/06/2020
Paper: DAG-GNN: DAG Structure Learning with Graph Neural Networks
Presenter: Shou Xiao
Participants: Heri, Isabelle, Kristin Bennett, Zhengying
Main idea of the article: Structure Learning - learning a weighted adjacency matrix induced by DAG by formulating the generalized linear SEM as optimization problem in the framework of VAE.
Meeting of 28/05/2020
Paper: The geography of COVID-19 spread in Italy and implications for the relaxation of confinement measures
Presenter: Martin Cepeda
Participants: all TAUers were there!
Main idea of the article: Using transport-based contact matrices between Italian regions, the authors were able to effectively model bed occupancy, disease spreading and possible post lock-down scenarios for all 107 Italian provinces.
Meeting of 08/05/2020
Paper1: Geometry-Aware Gradient Algorithms for Neural Architecture Search
Paper2: Meta-Dataset : a dataset of datasets for learning to learn from few examples
Presenter: Romain Egelé, Adrian El Baz
Participants: all TAUers were there!
Main idea of the article 1:
The article focused on gradient-based neural architecture search with weight sharing. It is building a formal framework to understand the optimization of these methods. It is bringing an exponentiated gradient to accelerate the convergence to a stationary point. It is bringing a KL-divergence regularization to increase the sparsity of mixture-weights. The authors are also introducing a random search with weight sharing as a new baseline for neural architecture search. Finally the author benchmark existing methods on CIFAR-10 and tested an HPO algorithm ASHA on the NAS problem.
Main idea of the article 2: #TODO (@Adrian)
paper1: slides
paper2 #TODO (@Adrian): please add link here
Meeting of 19/03/2020
Paper: Evolving Neural Networks through Augmenting Topologies
Presenter: Saumya Jetley
Participants: all TAU was there!
Main idea of the article: #TODO (@Saumya)
Slides: #TODO (@Saumya)
Meeting of 3/10/2019
Paper: Each one's 3 favorite papers in ECML PKDD 201
Presenter: Guillaume Doquet, Michèle Sebag, Pierre Wolinski, Victor Berger, Zhengying Liu
Participants: all TAU was there!
Main idea of the article:
Slides: #TODO
Meeting of 11/07/2019
Paper: Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search." ICLR 2019
Presenter: Zhen
Participants: Zhengying, Zhen
Main idea of the article: Relaxation on the architecture used, like doing lasso on the architecture
Slides: #TODO: @Zhen
Meeting of 20/06/2019
Paper: Adam Gaier and David Ha, "Weight Agnostic Neural Networks". arxiv
Presenter: Zhengying
Participants: Lisheng, Zhen, Nathan
Main idea of the article: Learning / training ONLY the architecture of neural networks using neuroevolution (NEAT), with UNIFORM (or random) weight for ALL neurons.
Meeting of 23/05/2019
Paper: Chelsea Finn et al. "Online Meta-Learning". ICML 2019
Presenter: Lisheng
Participants: Zhengying, Heri, Zhen, Nathan
Main idea of the article: #TODO: @Lisheng
Slides #TODO: @Lisheng
Meeting of 16/05/2019
Paper: Lukas Hahn, et al. “Fast and Reliable Architecture Selection for Convolution Neural Networks”
Presenter: Adrien
Participants: Zhen, Nathan, Zhengying, Heri, Isabelle
Main idea of the article: #TODO: @Adrien
Slides #TODO: @Adrien
Meeting of 02/05/2019
Paper: Marcin Andrychowicz, et al. “Learning to learn by gradient descent by gradient descent”
Presenter: Nathan
Participants: Zhen, Loris, Zhengying
Main idea of the article: #TODO: @Nathan
Slides #TODO: @Nathan
Meeting of 25/04/2019
Paper: Haifeng Jin, et al. “Auto-Keras: An Efficient Neural Architecture Search System.”
Presenter: Heri
Participants: Guillaume, Isabelle, Lisheng, Nathan, Zhen, Zhengying
Main idea of the article: This paper introduces new metric of distance between two neural architecture. This metric is composed of layer wise distance and connection wise distane
Meeting of 11/04/2019
#TODO: @Zhen
Meeting of 28/03/2019
Paper: Franceschi et al. "Bilevel Programming for Hyperparameter Optimization and Meta-Learning". ICML 2018
Presenter: Zhengying
Participants: Heri, Pierre
Main idea of the article: Use a bi-level optimization formulation to unify Hyperparameter Optimization and Meta-learning, then use a smooth procedure (a sequence of gradient descent) to approximate the dependence of the solution of the inner problem (as in bi-level formulation).
Meeting of 21/03/2019
No specific paper. Slides on Automated Deep Learning are presented.
Presenter: Zhengying
Participants: Zhen
Main idea of the article: Formulate AutoML as two layers of learning: supervised learning + reinforcement learning.
Meeting of 07/03/2019
Paper: de Laroussilhe, Quentin, et al. "Neural Architecture Search Over a Graph Search Space." arXiv preprint arXiv:1812.10666 (2018).
Presenter: Guillaume Doquet
Participants: Heri, Zhengying, Lisheng, Marc, Guillaume C, Adrien, Zhen
Main idea of the article: #TODO: @Guillaume
Slides #TODO: @Guillaume
Meeting of 31/01/2019
Paper: Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML2017
Presenter: Pierre
Participants: Heri, Zhengying, Lisheng
Main idea of the article: #TODO: @Pierre
Slides #TODO: @Pierre
Meeting of 24/01/2019
Paper: Sun-Hosoya et al: ActivMetaL: Algorithm Recommendation with Active Meta Learning (Related papers: paper1, paper2)
Presenter: Lisheng
Participants: Lisheng, Zhengying, Loris
Main idea of the article: Bring AutoML problem to recommender system, treat unknown performance as missing values in the meta learning matrix and use matrix factorization (non-probabilistic vs. probabilistic) techniques to solve that matrix.
Meeting of 15/11/2018
Paper: Falkner et al: BOHB: Robust and Efficient Hyperparameter Optimization at Scale
Presenter: Heri
Participants: Michèle, Pierre, Loris, Guillaume.
Main idea of the article: Improve Hyperband approach by using Baysian Optimization for sampling the set of hyperparameter (instead of uniform sampling for standard Hyperband)
Meeting of 25/10/2018
Paper: Madrid et al: Towards AutoML in the presence of Drift: first results
Presenter: Zhengying
Participants: Loris, Heri, Marc, Michèle, Guillaume Charpiat
Main idea of the article: In a lifelong learning + concept drift + AutoML setting, the authors use auto-sklearn + a drift detector + several model adaptation methods (e.g. re-train completely the model, update weights, add model, etc) and have some basic results for the AutoML3 challenge
Meeting of 11/10/2018
Paper: Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston "SMASH: One-Shot Model Architecture Search through HyperNetworks".
Presenter: Guillaume Doquet
Participants: Loris, Heri, Zhengying, Pierre, Guillaume Charpiat
Main idea of the article: #TODO: Guillaume
Slides (#TODO: Guillaume)
Meeting of 20/09/2018
Paper: Lorraine, Jonathan, and David Duvenaud. "Stochastic Hyperparameter Optimization through Hypernetworks." arXiv preprint arXiv:1802.09419 (2018).
Presenter: Pierre
Participants: Michèle, Marc, Guillaume Charpiat, Heri, Zhengying
Main idea of the article: #TODO: Pierre
Slides (#TODO: Pierre)
Meeting of 14/06/2018
Paper: Linnan, Wang; Yiyang, Zao; Yuu, Jinnai "AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search".
Presenter: Heri
Participants: Michèle, Marc, Zhengying, Pierre
Main idea of the article: Use Monte Carlo Tree Search method to find optimal architecture for neural network. The proposed approach exploits the "block" design introduced in NAS (Neural Architecture Search). Another composant to speed up the search is the meta-DNN model (for prediction of the performance of a given architecture).
Meeting of 07/06/2018
PhD seminar
Paper : Wolpert, David H., and William G. Macready. "No free lunch theorems for optimization." IEEE transactions on evolutionary computation 1.1 (1997): 67-82.
Paper: Wolpert, David H. "The lack of a priori distinctions between learning algorithms." Neural computation 8.7 (1996): 1341-1390.
Presenter: Zhengying
Participants: Guillaume C., Lisheng, Victor B., Olivier, Diviyan, Théophile, Victor E., Giancarlo
Main idea of the article: Any two (optimization) algorithms work equally well when their performance is averaged across all possible problems.
Meeting of 17/05/2018
Paper : Al-Shedivat, Maruan, et al. "Continuous adaptation via meta-learning in nonstationary and competitive environments.". ICLR 2018.
Presenter: Lisheng
Participants: Zhengying, Heri, Guillaume D.
Main idea of the article: Learn (via optimizing over-task loss) to continuously adapt policy to nonstationary (modeled as competitive agents) environments. Solve purely RL problems.
Meeting of 03/05/2018
Paper : Lisha, Li, et al. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization". ICLR 2017.
# TODO: Aris
Main idea of the article:
Meeting of 12/04/2018
Evolving the Topology of Large Scale Deep Neural Networks
Presenter: Guillaume DOQUET
Participants: Isabelle, Marc, Michèle, Laurent, Aris, Heri, Pierre, Guillaume, Zhengying
Main idea of the article: Use an evolutionary strategy to learn the structure of a deep neural network. The algorithm operates on 2 levels simultaneously : the macro scale (number and type of layers) and the layer scale
(parameters of that layer). Results are presented on CIFAR10, CIFAR100, MNIST, and Fashion-MNIST, beating or rivalizing state-of-the-art performance.
Meeting of 05/04/2018
Large-Scale Evolution of Image Classifiers
Learning Transferable Architectures for scalable image recognition
Presenter: Pierre Wolinski
Participants: Zhengying, Guillaume Charpiat, Heri, ?
Main idea of the articles:
1st article: basic evolutionary algorithm;
2nd article: train a RNN to build a convolutional neural network, where blocks of layers have the same structure (inception-like networks).
Slides : 1st article, 2nd article
Meeting of 29/03/2018
Monte Carlo tree search for algorithm selection (MOSAIC)
Presenter: Heri
Participants: Zhengying, Guillaume (Charpiat, Doquet), Pierre, Aris
Main idea of the article: Tackle hyperparameter optimization problem using monte carlo tree search: state (value of hyperparameters already choosed), action (value of the next hyperparameter to choose), reward (CV-score). The algorithm is composed of two parts: bandits part (designed for algorithm selection: random forest, svm, ...) and MCTS (for preprocessing and algorithm configuration). This new idea produces good (first) results but many improvements still to be done.
Meeting of 22/03/2018
Synthesis of AutoML Reading Group (21/11/2017-22/03/2018) (yes with color! and you can do that too!)
Presenter: Zhengying
Participants: Isabelle, les 2 Guillaumes, Heri, Pierre, Lisheng
Main idea of the article: Describe the AutoML problem in a comprehensive and intuitive optimization manner, formulate many existing AutoML approaches in a uniform way, attach each approach to one step in the classic machine learning pipeline and make some discussion on future research ideas.
Meeting of 08/03/2018
Paper : Swersky, Kevin, Jasper Snoek, and Ryan Prescott Adams. "Freeze-thaw Bayesian optimization." arXiv preprint arXiv:1406.3896 (2014).
Presenter: Lisheng
Participants: Isabelle, Marc, Guillaume Charpiat, Heri, Pierre, Zhengying
Main idea of the article: A strategy for efficiently choosing hyperparameters: pause the training of models that are not promising. Model training curves as samples of Gaussian process.
Meeting of 01/03/2018
Paper : M. Feurer, A. Klein, K. Eggensperger, J.T. Springenberg, M. Blum, F. Hutter "Efficient and Robust Automated Machine Learning" (NIPS 2015)
Presenter: Guillaume Doquet
Participants: Pierre, Zhengying, Isabelle, Guillaume Charpiat
Main idea of the article: Combine meta-learning, tree-based Bayesian Optimization (SMAC) and ensemble method (ensemble selection) to tackle AutoML problems.
Meeting of 22/02/2018
Paper: Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown (International Conference on Learning and Intelligent Optimization 2011)
Presenter: Pierre
Participants: Zhengying, Guillaume D, Guillaume C, Lisheng, Isabelle
Main idea of the article: Presentation of the hyperparameter search algorithm SMAC. SMAC is a SMBO-based algorithm using random forests to model hyperparameters. Moreover, it implements the case where one tunes the hyperparameters for multiple instance sets. (Slides)
Questions & Remarks: instance set possibly refers to a split of a data set between a train set and a validation set
Meeting of 08/02/2018
Paper: R. Bardenet, M. Brendel, B. Kégl, M. Sebag "Collaborative hyperparameter tuning" ICML (2013).
Presenter: Heri
Participants: Zhengying, Isabelle, Aris, Pierre, Heri
Main idea of the article: By collaborative tuning of hyperparameter on multiple datasets, one can incorporate (expert) knowledge from similar tasks to improve Bayesian hyperparameter search. Hyper-parameter ranking is used (instead of validation score) to assess the quality of one hyperparameter. (slides)
Meeting of 01/02/2018
Paper: Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L. J., Fei-Fei, L., & Murphy, K. "Progressive neural architecture search." arXiv preprint arXiv:1712.00559 (2017).
Presenter: Aris
Participants: Guillaume Charpiat, Guillaume Doquet, Heri, Lisheng, Zhengying, Aris
Main idea of the article: Learn an RNN that estimates the quality of a CNN sub-module ("cell") generated using multiple blocks (each chosen from some fixed options of convolutions and pooling operators).
When expanding the cell structure, the RNN is used to prune the search space. (Slides)
Meeting of 25/01/2018
Paper : Max Jaderberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu. "Spatial transformer networks." arXiv preprint arXiv:1506.02025v3 (2016).
Presenter : Lisheng
Participants : Michèle, Guillaume Charpiat, Aris, Heri, Guillaume Doquet
Main idea of the article : A Spatial transformation network which learns an appropriate transformation of input feature map, is proposed to be inserted to existing architecture to make the task (e.g. classification) in later layers easier, this is possible mainly because the STN is differentiable. (Slides)
Idea for potential use: A spatial transformer network can be viewed as a data generator constrained by final performance of the entire network.
Meeting of 18/01/2018
Paper : Koutník, Jan, Juergen Schmidhuber, and Faustino Gomez. "A frequency-domain encoding for neuroevolution." arXiv preprint arXiv:1212.6521 (2012).
Presenter : Zhengying
Participants : Michèle, Isabelle, Guillaume Charpiat, Lisheng, Aris, Heri, Guillaume Doquet
Main idea of the article : Solve Octopus Arm Problem by using a few Fourier coefficients (chromosome) to compactly represent recurrent neural networks and using Natural Evolution Strategy to select promising prior distribution of neural networks. (Slides)
Heri and Zhengying will run experiments following Michèle's idea
Meeting of 11/01/2018
Paper : Ravid Shwartz-Ziv and Naftali Tishby, "Opening the Black Box of Deep Neural Networks via Information" (Arxiv, March 2017)
Presenter : Guillaume Doquet
Participants : Guillaume Charpiat, Cyril, Zhengying, Heri, Guillaume Doquet
Main idea of the article : Deep neural networks go through 2 distinct phases during training. In the first phase, the mutual information between each hidden layer and the labels increases. In the second phase,
the mutual information between the layers and the data decreases. In other words, a compressed latent representation of the data is found. This is a byproduct of the stochastic nature of the gradient descent. (Slides)
Meeting of 04/01/2018
Paper: Saxe, Andrew M., et al. "On Random Weights and Unsupervised Feature Learning" (ICML 2011).
Presenter: Pierre Wolinski
Participants: Michèle, Guillaume Doquet, Guillaume Charpiat, Zhengying, Heri, Aris
Main idea of the article: In some cases, untrained neural networks are almost as accurate as trained neural networks. By studying the Fourier transform of the convolution, we are able to explain these results. Moreover, the article gives a heuristic for architecture selection. (Slides)
Compare the eigenvalue distribution of a random filter and a trained filter
Replace the random layer applied over an image with the Fourier transform of the image => better ? worse ?
Meeting of 21/12/2017
Paper: Domhan et at. "Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves" (IJCAI 2015).
Presenter: Heri
Participants: Michèle, Guillaume (Doquet, Charpiat), Isabelle, Pierre, Zhengying
Main idea of the article: Speeding up the search of hyperparameter by predicting the performance of the model. Use a set of parametric function to extrapolate the learning curve. Stop runs that are unlikely to perform the best observed so far. (Slides)
Run the code on new dataset
Meeting of 30/11/2017
Paper: Klein, Aaron, et al. "Fast bayesian optimization of machine learning hyperparameters on large datasets." arXiv preprint arXiv:1605.07079 (2016).
Presenter: Zhengying
Participants: Isabelle, Michèle, Marc, Lisheng, Guillaume (Doquet), Heri
Main idea of the article: Use Bayesian Optimization to do hyperparameter selection, with faster training (thus faster loss evaluation) using sampled sub-dataset, following a strategy that chooses next point to evaluate by maximizing information gain per computational cost on the distribution of the global minimum of the goal function (e.g. validation error w.r.t hyperparameter)
Slides: 10 pages, contains also a very brief introduction to Bayesian Optimization and Gaussian Process, with a small exercise ;)
Remarks & questions:
In Bayesian Optimization, the Bayesian philosophy is applied, but Bayes theorem is not used
Isabelle and Lisheng are using an approach with similar idea (maximize knowledge gain per computational cost) and which is even more general
Zhengying will run the code of the authors to test the performance of the algorithm
Meeting of 21/11/2017
Paper: Munoz, Mario A., et al. "Instance Spaces for Machine Learning Classification." Mach. Learn (2017).
Presenter: Guillaume Doquet
Participants: Michèle, Lisheng, Heri, Zhengying
Main idea of the article: Extend the Algorithm Selection Problem framework suggested by Rice to gain knowledge on how well the combination of different algorithms and datasets can be or to objectively measure the performance of an algorithm, using 2-d visualisation in a so-called instance space. For each instance (a dataset and a classification problem), a lot of features are computed and then selected. A performance is adopted then SVM is used for fitting.