AutoML (Humania) reading group

Organizers: Haozhe (sunhaozhe275940200@gmail.com), Heri (heri@lri.fr), Zhengying (zhengying.liu@inria.fr)

Link to Zotero AutoML group

Even though this Reading Group is called AutoML due to historical reason, you are free to present papers that you wish to share in your own field or elsewhere.

Next meeting

TBA

Paper: TBA

Presenter: TBA

Cycle of presenters

Haozhe, Adrian, Adrien, Herilalaina, Romain, Saumya, Zhen, Zhengying

If you are not in this list but want to give presentations, please send a message to the organizers.

Practical information

During the Covid period, the meetings will be online via Skype. The link is here:

https://join.skype.com/aGf0iOHs5yJp

Legacy: Room 2014, DIGITEO (Bat. 660) or via video call link

Add to calendar / TAU calendar (Zimbra login required)

Leave Questions / Remarks

Propose a talk

Place to put questions during the presentation: Overleaf link

To-Read List

Link to this list in Zotero group.

Everyone is welcome to suggest and add papers to read (send your Zotero username or email address to the organizers).

Participants

"tau-seminars" <tau-seminars@inria.fr> (only send to this mail)

"tau" <tau@lri.fr>

"guillaume charpiat" <guillaume.charpiat@inria.fr>

"Isabelle Guyon" <guyon@clopinet.com>

"Michele Sebag" <Michele.Sebag@lri.fr>

"doquet" <Guillaume.Doquet@lri.fr>

"fissore" <giancarlo.fissore@lri.fr>

"Marc Schoenauer" <marc.schoenauer@inria.fr>

"Laurent Cetinsoy" <laurent.cetinsoy@gmail.com>

"loris felardos" <loris.felardos@lri.fr>

Michael Vaccaro <michael.vaccaro@student-cs.fr>

Hugo Sonnery <hugo.sonnery@student-cs.fr>

Past Meetings

April 21, 2021

Paper: Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Presenter: Romain

Slides

Participants: Haozhe, Zhengying, Marc, Zak

Takeaway:

building an ensemble is easy
an ensemble of models can be used for uncertainty quantification
diversity is important in an ensemble

April 14, 2021

Paper: Meta Pseudo Labels

Presenter: Zhengying

Slides

Participants: Adrien, Adrian, Zhen, Haozhe, Alessandro, Hung Nguyen

Takeaway:

Use a (constantly changing) teacher network and student network to achieve semi-supervised learning
Interesting ideas for semi-supervised learning, integrating ideas from meta-learning, NAS and other fields
Achieved 90.2% top-1 accuracy on ImageNet

March 17, 2021

Paper: Free Lunch for Few-shot Learning: Distribution Calibration

Presenter: Heri

Slides

Participants: Adrien, Adrian, Zhengying

Takeaway: One can bypass expensive fine tuning with a simple distribution calibration over the embedded features. This propose to estimate data statistics (mean and variance) of target task by transferring statistics of neighbor classes used in Train. Then, target data statistics are used to generate more examples on the support set. And finally, they train simple classifier over the augmented data. Their approach beats some complex and expensive few shot learners.

March 10, 2021

Paper: Cross-validation for selecting a model selection procedure

Presenter: Adrien

Slides

Participants: Adrian, Haozhe, Zhen, Zhengying

Takeaway: the following 3 statements about cross-validation (CV) are NOT generally true:

Leave-one-out (LOO) CV has smaller bias but larger variance than K-Fold CV
Better estimation of the prediction error by CV means better model selection
The best method to use for model selection is 10-fold CV (but actually it seems to perform well and is a good trade-off)

February 10, 2021

Paper: Compositional generalization through meta sequence-to-sequence learning

Presenter: Adrian

Slides

Participants: Zhen, Haozhe, Isabelle, Michael, Adrien

Takeaway: compositional learning on SCAN tasks (seq2seq) created some ground to experiment the selection of meta-training data to induce inductive biases into a model. It would be interesting to see if similar experiments can be performed on more complex data (e.g. images, ...).

January 27, 2021

Paper: Meta-Learning Symmetries by Reparameterization

Presenter: Haozhe Sun

Slides

Participants: Adrian, Romain, Zhen, Zhengying

Takeaway: this paper presents a way to automatically learn equivariance from data that share certain common underlying symmetry. This approach uses matrix factorization and meta-learning framework.

July 2, 2020

Paper: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Presenter: Dhiaeddine YOUSFI

Meeting of 11/06/2020

Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Presenter: Zhengying

Participants: all TAUers were there!

Main idea of the article: Empirical work that states a hypothesis on the existence of winning tickets (sub-networks that train faster with better performance) and shed on insights on the behavior of SGD.

slides

Meeting of 04/06/2020

Paper: DAG-GNN: DAG Structure Learning with Graph Neural Networks

Presenter: Shou Xiao

Participants: Heri, Isabelle, Kristin Bennett, Zhengying

Main idea of the article: Structure Learning - learning a weighted adjacency matrix induced by DAG by formulating the generalized linear SEM as optimization problem in the framework of VAE.

slides

Meeting of 28/05/2020

Paper: The geography of COVID-19 spread in Italy and implications for the relaxation of confinement measures

Presenter: Martin Cepeda

Participants: all TAUers were there!

Main idea of the article: Using transport-based contact matrices between Italian regions, the authors were able to effectively model bed occupancy, disease spreading and possible post lock-down scenarios for all 107 Italian provinces.

Slides

Meeting of 08/05/2020

Paper1: Geometry-Aware Gradient Algorithms for Neural Architecture Search

Paper2: Meta-Dataset : a dataset of datasets for learning to learn from few examples

Presenter: Romain Egelé, Adrian El Baz

Participants: all TAUers were there!

Main idea of the article 1:

The article focused on gradient-based neural architecture search with weight sharing. It is building a formal framework to understand the optimization of these methods. It is bringing an exponentiated gradient to accelerate the convergence to a stationary point. It is bringing a KL-divergence regularization to increase the sparsity of mixture-weights. The authors are also introducing a random search with weight sharing as a new baseline for neural architecture search. Finally the author benchmark existing methods on CIFAR-10 and tested an HPO algorithm ASHA on the NAS problem.

Main idea of the article 2: #TODO (@Adrian)

Slides:

paper1: slides

paper2 #TODO (@Adrian): please add link here

Meeting of 19/03/2020

Paper: Evolving Neural Networks through Augmenting Topologies

Presenter: Saumya Jetley

Participants: all TAU was there!

Main idea of the article: #TODO (@Saumya)

Slides: #TODO (@Saumya)

Meeting of 3/10/2019

Paper: Each one's 3 favorite papers in ECML PKDD 201

Presenter: Guillaume Doquet, Michèle Sebag, Pierre Wolinski, Victor Berger, Zhengying Liu

Participants: all TAU was there!

Main idea of the article:

Slides: #TODO

Meeting of 11/07/2019

Paper: Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search." ICLR 2019

Presenter: Zhen

Participants: Zhengying, Zhen

Main idea of the article: Relaxation on the architecture used, like doing lasso on the architecture

Slides: #TODO: @Zhen

Meeting of 20/06/2019

Paper: Adam Gaier and David Ha, "Weight Agnostic Neural Networks". arxiv

Presenter: Zhengying

Participants: Lisheng, Zhen, Nathan

Main idea of the article: Learning / training ONLY the architecture of neural networks using neuroevolution (NEAT), with UNIFORM (or random) weight for ALL neurons.

Slides

Meeting of 23/05/2019

Paper: Chelsea Finn et al. "Online Meta-Learning". ICML 2019

Presenter: Lisheng

Participants: Zhengying, Heri, Zhen, Nathan

Main idea of the article: #TODO: @Lisheng

Slides #TODO: @Lisheng

Meeting of 16/05/2019

Paper: Lukas Hahn, et al. “Fast and Reliable Architecture Selection for Convolution Neural Networks”

Presenter: Adrien

Participants: Zhen, Nathan, Zhengying, Heri, Isabelle

Main idea of the article: #TODO: @Adrien

Slides #TODO: @Adrien

Meeting of 02/05/2019

Paper: Marcin Andrychowicz, et al. “Learning to learn by gradient descent by gradient descent”

Presenter: Nathan

Participants: Zhen, Loris, Zhengying

Main idea of the article: #TODO: @Nathan

Slides #TODO: @Nathan

Meeting of 25/04/2019

Paper: Haifeng Jin, et al. “Auto-Keras: An Efficient Neural Architecture Search System.”

Presenter: Heri

Participants: Guillaume, Isabelle, Lisheng, Nathan, Zhen, Zhengying

Main idea of the article: This paper introduces new metric of distance between two neural architecture. This metric is composed of layer wise distance and connection wise distane

Slides

Meeting of 11/04/2019

#TODO: @Zhen

Meeting of 28/03/2019

Paper: Franceschi et al. "Bilevel Programming for Hyperparameter Optimization and Meta-Learning". ICML 2018

Presenter: Zhengying

Participants: Heri, Pierre

Main idea of the article: Use a bi-level optimization formulation to unify Hyperparameter Optimization and Meta-learning, then use a smooth procedure (a sequence of gradient descent) to approximate the dependence of the solution of the inner problem (as in bi-level formulation).

Slides

Meeting of 21/03/2019

No specific paper. Slides on Automated Deep Learning are presented.

Presenter: Zhengying

Participants: Zhen

Main idea of the article: Formulate AutoML as two layers of learning: supervised learning + reinforcement learning.

Slides

Meeting of 07/03/2019

Paper: de Laroussilhe, Quentin, et al. "Neural Architecture Search Over a Graph Search Space." arXiv preprint arXiv:1812.10666 (2018).

Presenter: Guillaume Doquet

Participants: Heri, Zhengying, Lisheng, Marc, Guillaume C, Adrien, Zhen

Main idea of the article: #TODO: @Guillaume

Slides #TODO: @Guillaume

Meeting of 31/01/2019

Paper: Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML2017

Presenter: Pierre

Participants: Heri, Zhengying, Lisheng

Main idea of the article: #TODO: @Pierre

Slides #TODO: @Pierre

Meeting of 24/01/2019

Paper: Sun-Hosoya et al: ActivMetaL: Algorithm Recommendation with Active Meta Learning (Related papers: paper1, paper2)

Presenter: Lisheng

Participants: Lisheng, Zhengying, Loris

Main idea of the article: Bring AutoML problem to recommender system, treat unknown performance as missing values in the meta learning matrix and use matrix factorization (non-probabilistic vs. probabilistic) techniques to solve that matrix.

Slides

Meeting of 15/11/2018

Paper: Falkner et al: BOHB: Robust and Efficient Hyperparameter Optimization at Scale

Presenter: Heri

Participants: Michèle, Pierre, Loris, Guillaume.

Main idea of the article: Improve Hyperband approach by using Baysian Optimization for sampling the set of hyperparameter (instead of uniform sampling for standard Hyperband)

Slides

Meeting of 25/10/2018

Paper: Madrid et al: Towards AutoML in the presence of Drift: ﬁrst results

Presenter: Zhengying

Participants: Loris, Heri, Marc, Michèle, Guillaume Charpiat

Main idea of the article: In a lifelong learning + concept drift + AutoML setting, the authors use auto-sklearn + a drift detector + several model adaptation methods (e.g. re-train completely the model, update weights, add model, etc) and have some basic results for the AutoML3 challenge

Slides

Meeting of 11/10/2018

Paper: Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston "SMASH: One-Shot Model Architecture Search through HyperNetworks".

Presenter: Guillaume Doquet

Participants: Loris, Heri, Zhengying, Pierre, Guillaume Charpiat

Main idea of the article: #TODO: Guillaume

Slides (#TODO: Guillaume)

Meeting of 20/09/2018

Paper: Lorraine, Jonathan, and David Duvenaud. "Stochastic Hyperparameter Optimization through Hypernetworks." arXiv preprint arXiv:1802.09419 (2018).

Presenter: Pierre

Participants: Michèle, Marc, Guillaume Charpiat, Heri, Zhengying

Main idea of the article: #TODO: Pierre

Slides (#TODO: Pierre)

Meeting of 14/06/2018

Paper: Linnan, Wang; Yiyang, Zao; Yuu, Jinnai "AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search".

Presenter: Heri

Participants: Michèle, Marc, Zhengying, Pierre

Main idea of the article: Use Monte Carlo Tree Search method to find optimal architecture for neural network. The proposed approach exploits the "block" design introduced in NAS (Neural Architecture Search). Another composant to speed up the search is the meta-DNN model (for prediction of the performance of a given architecture).

Slides

Meeting of 07/06/2018

PhD seminar

Paper : Wolpert, David H., and William G. Macready. "No free lunch theorems for optimization." IEEE transactions on evolutionary computation 1.1 (1997): 67-82.

Paper: Wolpert, David H. "The lack of a priori distinctions between learning algorithms." Neural computation 8.7 (1996): 1341-1390.

Presenter: Zhengying

Participants: Guillaume C., Lisheng, Victor B., Olivier, Diviyan, Théophile, Victor E., Giancarlo

Main idea of the article: Any two (optimization) algorithms work equally well when their performance is averaged across all possible problems.

Slides

Meeting of 17/05/2018

Paper : Al-Shedivat, Maruan, et al. "Continuous adaptation via meta-learning in nonstationary and competitive environments.". ICLR 2018.

Presenter: Lisheng

Participants: Zhengying, Heri, Guillaume D.

Main idea of the article: Learn (via optimizing over-task loss) to continuously adapt policy to nonstationary (modeled as competitive agents) environments. Solve purely RL problems.

Slides

Meeting of 03/05/2018

Paper : Lisha, Li, et al. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization". ICLR 2017.

# TODO: Aris

Presenter:

Participants:

Main idea of the article:

Slides

Meeting of 12/04/2018

Evolving the Topology of Large Scale Deep Neural Networks

Presenter: Guillaume DOQUET

Participants: Isabelle, Marc, Michèle, Laurent, Aris, Heri, Pierre, Guillaume, Zhengying

Main idea of the article: Use an evolutionary strategy to learn the structure of a deep neural network. The algorithm operates on 2 levels simultaneously : the macro scale (number and type of layers) and the layer scale

(parameters of that layer). Results are presented on CIFAR10, CIFAR100, MNIST, and Fashion-MNIST, beating or rivalizing state-of-the-art performance.

Slides

Meeting of 05/04/2018

Large-Scale Evolution of Image Classifiers

Learning Transferable Architectures for scalable image recognition

Presenter: Pierre Wolinski

Participants: Zhengying, Guillaume Charpiat, Heri, ?

Main idea of the articles:

- 1st article: basic evolutionary algorithm;
- 2nd article: train a RNN to build a convolutional neural network, where blocks of layers have the same structure (inception-like networks).

Slides : 1st article, 2nd article

Meeting of 29/03/2018

Monte Carlo tree search for algorithm selection (MOSAIC)

Presenter: Heri

Participants: Zhengying, Guillaume (Charpiat, Doquet), Pierre, Aris

Main idea of the article: Tackle hyperparameter optimization problem using monte carlo tree search: state (value of hyperparameters already choosed), action (value of the next hyperparameter to choose), reward (CV-score). The algorithm is composed of two parts: bandits part (designed for algorithm selection: random forest, svm, ...) and MCTS (for preprocessing and algorithm configuration). This new idea produces good (first) results but many improvements still to be done.

Slides

Meeting of 22/03/2018

Synthesis of AutoML Reading Group (21/11/2017-22/03/2018) (yes with color! and you can do that too!)

Presenter: Zhengying

Participants: Isabelle, les 2 Guillaumes, Heri, Pierre, Lisheng

Main idea of the article: Describe the AutoML problem in a comprehensive and intuitive optimization manner, formulate many existing AutoML approaches in a uniform way, attach each approach to one step in the classic machine learning pipeline and make some discussion on future research ideas.

Slides

Meeting of 08/03/2018

Paper : Swersky, Kevin, Jasper Snoek, and Ryan Prescott Adams. "Freeze-thaw Bayesian optimization." arXiv preprint arXiv:1406.3896 (2014).

Presenter: Lisheng

Participants: Isabelle, Marc, Guillaume Charpiat, Heri, Pierre, Zhengying

Main idea of the article: A strategy for efficiently choosing hyperparameters: pause the training of models that are not promising. Model training curves as samples of Gaussian process.

Slides

Meeting of 01/03/2018

Paper : M. Feurer, A. Klein, K. Eggensperger, J.T. Springenberg, M. Blum, F. Hutter "Efficient and Robust Automated Machine Learning" (NIPS 2015)

Presenter: Guillaume Doquet

Participants: Pierre, Zhengying, Isabelle, Guillaume Charpiat

Main idea of the article: Combine meta-learning, tree-based Bayesian Optimization (SMAC) and ensemble method (ensemble selection) to tackle AutoML problems.

Slides

Meeting of 22/02/2018

Paper: Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown (International Conference on Learning and Intelligent Optimization 2011)

Presenter: Pierre

Participants: Zhengying, Guillaume D, Guillaume C, Lisheng, Isabelle

Main idea of the article: Presentation of the hyperparameter search algorithm SMAC. SMAC is a SMBO-based algorithm using random forests to model hyperparameters. Moreover, it implements the case where one tunes the hyperparameters for multiple instance sets. (Slides)

Questions & Remarks: instance set possibly refers to a split of a data set between a train set and a validation set

Meeting of 08/02/2018

Paper: R. Bardenet, M. Brendel, B. Kégl, M. Sebag "Collaborative hyperparameter tuning" ICML (2013).

Presenter: Heri

Participants: Zhengying, Isabelle, Aris, Pierre, Heri

Main idea of the article: By collaborative tuning of hyperparameter on multiple datasets, one can incorporate (expert) knowledge from similar tasks to improve Bayesian hyperparameter search. Hyper-parameter ranking is used (instead of validation score) to assess the quality of one hyperparameter. (slides)

Meeting of 01/02/2018

Paper: Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L. J., Fei-Fei, L., & Murphy, K. "Progressive neural architecture search." arXiv preprint arXiv:1712.00559 (2017).

Presenter: Aris

Participants: Guillaume Charpiat, Guillaume Doquet, Heri, Lisheng, Zhengying, Aris

Main idea of the article: Learn an RNN that estimates the quality of a CNN sub-module ("cell") generated using multiple blocks (each chosen from some fixed options of convolutions and pooling operators).

When expanding the cell structure, the RNN is used to prune the search space. (Slides)

Meeting of 25/01/2018

Paper : Max Jaderberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu. "Spatial transformer networks." arXiv preprint arXiv:1506.02025v3 (2016).

Presenter : Lisheng

Participants : Michèle, Guillaume Charpiat, Aris, Heri, Guillaume Doquet

Main idea of the article : A Spatial transformation network which learns an appropriate transformation of input feature map, is proposed to be inserted to existing architecture to make the task (e.g. classification) in later layers easier, this is possible mainly because the STN is differentiable. (Slides)

To-do:

1. Idea for potential use: A spatial transformer network can be viewed as a data generator constrained by final performance of the entire network.

Meeting of 18/01/2018

Paper : Koutník, Jan, Juergen Schmidhuber, and Faustino Gomez. "A frequency-domain encoding for neuroevolution." arXiv preprint arXiv:1212.6521 (2012).

Presenter : Zhengying

Participants : Michèle, Isabelle, Guillaume Charpiat, Lisheng, Aris, Heri, Guillaume Doquet

Main idea of the article : Solve Octopus Arm Problem by using a few Fourier coefficients (chromosome) to compactly represent recurrent neural networks and using Natural Evolution Strategy to select promising prior distribution of neural networks. (Slides)

To-do:

Heri and Zhengying will run experiments following Michèle's idea

Meeting of 11/01/2018

Paper : Ravid Shwartz-Ziv and Naftali Tishby, "Opening the Black Box of Deep Neural Networks via Information" (Arxiv, March 2017)

Presenter : Guillaume Doquet

Participants : Guillaume Charpiat, Cyril, Zhengying, Heri, Guillaume Doquet

Main idea of the article : Deep neural networks go through 2 distinct phases during training. In the first phase, the mutual information between each hidden layer and the labels increases. In the second phase,

the mutual information between the layers and the data decreases. In other words, a compressed latent representation of the data is found. This is a byproduct of the stochastic nature of the gradient descent. (Slides)

Meeting of 04/01/2018

Paper: Saxe, Andrew M., et al. "On Random Weights and Unsupervised Feature Learning" (ICML 2011).

Presenter: Pierre Wolinski

Participants: Michèle, Guillaume Doquet, Guillaume Charpiat, Zhengying, Heri, Aris

Main idea of the article: In some cases, untrained neural networks are almost as accurate as trained neural networks. By studying the Fourier transform of the convolution, we are able to explain these results. Moreover, the article gives a heuristic for architecture selection. (Slides)

To-do:

1. Compare the eigenvalue distribution of a random filter and a trained filter
2. Replace the random layer applied over an image with the Fourier transform of the image => better ? worse ?

Meeting of 21/12/2017

Paper: Domhan et at. "Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves" (IJCAI 2015).

Presenter: Heri

Participants: Michèle, Guillaume (Doquet, Charpiat), Isabelle, Pierre, Zhengying

Main idea of the article: Speeding up the search of hyperparameter by predicting the performance of the model. Use a set of parametric function to extrapolate the learning curve. Stop runs that are unlikely to perform the best observed so far. (Slides)

To-do:

1. Run the code on new dataset

Meeting of 30/11/2017

Paper: Klein, Aaron, et al. "Fast bayesian optimization of machine learning hyperparameters on large datasets." arXiv preprint arXiv:1605.07079 (2016).

Presenter: Zhengying

Participants: Isabelle, Michèle, Marc, Lisheng, Guillaume (Doquet), Heri

Main idea of the article: Use Bayesian Optimization to do hyperparameter selection, with faster training (thus faster loss evaluation) using sampled sub-dataset, following a strategy that chooses next point to evaluate by maximizing information gain per computational cost on the distribution of the global minimum of the goal function (e.g. validation error w.r.t hyperparameter)

Slides: 10 pages, contains also a very brief introduction to Bayesian Optimization and Gaussian Process, with a small exercise ;)

Remarks & questions:

1. In Bayesian Optimization, the Bayesian philosophy is applied, but Bayes theorem is not used
2. Isabelle and Lisheng are using an approach with similar idea (maximize knowledge gain per computational cost) and which is even more general

To-do:

1. Zhengying will run the code of the authors to test the performance of the algorithm

Meeting of 21/11/2017

Paper: Munoz, Mario A., et al. "Instance Spaces for Machine Learning Classification." Mach. Learn (2017).

Presenter: Guillaume Doquet

Participants: Michèle, Lisheng, Heri, Zhengying

Main idea of the article: Extend the Algorithm Selection Problem framework suggested by Rice to gain knowledge on how well the combination of different algorithms and datasets can be or to objectively measure the performance of an algorithm, using 2-d visualisation in a so-called instance space. For each instance (a dataset and a classification problem), a lot of features are computed and then selected. A performance is adopted then SVM is used for fitting.